Re: [ClusterLabs] Antw: Doing reload right

2016-07-13 Thread Andrew Beekhof
On Sat, Jul 2, 2016 at 1:26 AM, Ken Gaillot  wrote:
> On 07/01/2016 04:48 AM, Jan Pokorný wrote:
>> On 01/07/16 09:23 +0200, Ulrich Windl wrote:
>> Ken Gaillot  schrieb am 30.06.2016 um 18:58 in 
>> Nachricht
>>> <57754f9f.8070...@redhat.com>:
 I've been meaning to address the implementation of "reload" in Pacemaker
 for a while now, and I think the next release will be a good time, as it
 seems to be coming up more frequently.

 In the current implementation, Pacemaker considers a resource parameter
 "reloadable" if the resource agent supports the "reload" action, and the
 agent's metadata marks the parameter with "unique=0". If (only) such
 parameters get changed in the resource's pacemaker configuration,
 pacemaker will call the agent's reload action rather than the
 stop-then-start it usually does for parameter changes.

 This is completely broken for two reasons:
>>>
>>> I agree ;-)
>>>

 1. It relies on "unique=0" to determine reloadability. "unique" was
 originally intended (and is widely used by existing resource agents) as
 a hint to UIs to indicate which parameters uniquely determine a resource
 instance. That is, two resource instances should never have the same
 value of a "unique" parameter. For this purpose, it makes perfect sense
 that (for example) the path to a binary command would have unique=0 --
 multiple resource instances could (and likely would) use the same
 binary. However, such a parameter could never be reloadable.
>>>
>>> I tought unique=0 were reloadable (unique=1 were not)...
>
> Correct. By "could never be reloadable", I mean that if someone changes
> the location of the daemon binary, there's no way the agent could change
> that with anything other than a full restart. So using unique=0 to
> indicate reloadable doesn't make sense.
>
>> I see a doubly-distorted picture here:
>> - actually "unique=1" on a RA parameter (together with this RA supporting
>>   "reload") currently leads to reload-on-change
>> - also the provided example shows why reload for "unique=0" is wrong,
>>   but as the opposite applies as of current state, it's not an argument
>>   why something is broken
>>
>> See also:
>> https://github.com/ClusterLabs/pacemaker/commit/2f5d44d4406e9a8fb5b380cb56ab8a70d7ad9c23
>
> Nope, unique=1 is used for the *restart* list -- the non-reloadable
> parameters.
>
 2. Every known resource agent that implements a reload action does so
 incorrectly. Pacemaker uses reload for changes in the resource's
 *pacemaker* configuration, while all known RAs use reload for a
 service's native reload capability of its own configuration file. As an
 example, the ocf:heartbeat:named RA calls "rndc reload" for its reload
 action, which will have zero effect on any pacemaker-configured
 parameters -- and on top of that, the RA uses "unique=0" in its correct
 UI sense, and none of those parameters are actually reloadable.
>>
>> (per the last subclause, applicable also, after mentioned inversion, for
>> "unique=1", such as a pid file path, which cannot be reloadable for
>> apparent reason)
>>
>>> Maybe LSB confusion...
>>
>> That's not entirely fair vindication, as when you have to do some
>> extra actions with parameters in LSB-aliased "start" action in the
>> RA, you should do such reflections also for "reload".
>
> I think the point is that "reload" for an LSB init script or systemd
> unit always reloads the native service configuration, so it's natural
> for administrators and developers to think of that when they see "reload".

But also because LSB have no parameters to think about.

>
 My proposed solution is:

 * Add a new "reloadable" attribute for resource agent metadata, to
 indicate reloadable parameters. Pacemaker would use this instead of
 "unique".
>>>
>>> No objections if you change the XML metadata version number this time ;-)
>>
>> Good point, but I guess everyone's a bit scared to open this Pandora
>> box as there's so much technical debt connected to that (unifying FA/RA
>> metadata if possible, adding new UI-oriented annotations, pacemaker's
>> silent additions like "private" parameter).
>> I'd imagine an established authority for OCF matters (and maintaing
>> https://github.com/ClusterLabs/OCF-spec) and at least partly formalized
>> process inspired by Python PEPs for coordinated development:
>> https://www.python.org/dev/peps/pep-0001/
>>
>> 
>
> An update to the OCF spec is long overdue. I wouldn't mind those wheels
> starting to turn, but I think this reload change could proceed
> independently (though of course coordinated at the appropriate time).
>
 * Add a new "reload-options" RA action for the ability to reload
 Pacemaker-configured options. Pacemaker would call this instead if 
 "reload".
>>>
>>> Why not "reload-parameters"?
>>
>> That came to my mind as well.  Or not 

Re: [ClusterLabs] Antw: Doing reload right

2016-07-13 Thread Andrew Beekhof
On Fri, Jul 1, 2016 at 7:48 PM, Jan Pokorný  wrote:
> On 01/07/16 09:23 +0200, Ulrich Windl wrote:
> Ken Gaillot  schrieb am 30.06.2016 um 18:58 in 
> Nachricht
>> <57754f9f.8070...@redhat.com>:
>>> I've been meaning to address the implementation of "reload" in Pacemaker
>>> for a while now, and I think the next release will be a good time, as it
>>> seems to be coming up more frequently.
>>>
>>> In the current implementation, Pacemaker considers a resource parameter
>>> "reloadable" if the resource agent supports the "reload" action, and the
>>> agent's metadata marks the parameter with "unique=0". If (only) such
>>> parameters get changed in the resource's pacemaker configuration,
>>> pacemaker will call the agent's reload action rather than the
>>> stop-then-start it usually does for parameter changes.
>>>
>>> This is completely broken for two reasons:
>>
>> I agree ;-)
>>
>>>
>>> 1. It relies on "unique=0" to determine reloadability. "unique" was
>>> originally intended (and is widely used by existing resource agents) as
>>> a hint to UIs to indicate which parameters uniquely determine a resource
>>> instance. That is, two resource instances should never have the same
>>> value of a "unique" parameter. For this purpose, it makes perfect sense
>>> that (for example) the path to a binary command would have unique=0 --
>>> multiple resource instances could (and likely would) use the same
>>> binary. However, such a parameter could never be reloadable.
>>
>> I tought unique=0 were reloadable (unique=1 were not)...

Exactly

> I see a doubly-distorted picture here:
> - actually "unique=1" on a RA parameter (together with this RA supporting
>   "reload") currently leads to reload-on-change

Are you 100% sure about that?

> - also the provided example shows why reload for "unique=0" is wrong,
>   but as the opposite applies as of current state, it's not an argument
>   why something is broken
>
> See also:
> https://github.com/ClusterLabs/pacemaker/commit/2f5d44d4406e9a8fb5b380cb56ab8a70d7ad9c23

What about it?

>
>>> 2. Every known resource agent that implements a reload action does so
>>> incorrectly. Pacemaker uses reload for changes in the resource's
>>> *pacemaker* configuration, while all known RAs use reload for a
>>> service's native reload capability of its own configuration file. As an
>>> example, the ocf:heartbeat:named RA calls "rndc reload" for its reload
>>> action, which will have zero effect on any pacemaker-configured
>>> parameters -- and on top of that, the RA uses "unique=0" in its correct
>>> UI sense, and none of those parameters are actually reloadable.
>
> (per the last subclause, applicable also, after mentioned inversion, for
> "unique=1", such as a pid file path, which cannot be reloadable for
> apparent reason)
>
>> Maybe LSB confusion...
>
> That's not entirely fair vindication, as when you have to do some
> extra actions with parameters in LSB-aliased "start" action in the
> RA, you should do such reflections also for "reload".
>
>>> My proposed solution is:
>>>
>>> * Add a new "reloadable" attribute for resource agent metadata, to
>>> indicate reloadable parameters. Pacemaker would use this instead of
>>> "unique".
>>
>> No objections if you change the XML metadata version number this time ;-)
>
> Good point, but I guess everyone's a bit scared to open this Pandora
> box as there's so much technical debt connected to that (unifying FA/RA
> metadata if possible, adding new UI-oriented annotations, pacemaker's
> silent additions like "private" parameter).
> I'd imagine an established authority for OCF matters (and maintaing
> https://github.com/ClusterLabs/OCF-spec) and at least partly formalized
> process inspired by Python PEPs for coordinated development:
> https://www.python.org/dev/peps/pep-0001/
>
> 
>
>>> * Add a new "reload-options" RA action for the ability to reload
>>> Pacemaker-configured options. Pacemaker would call this instead if "reload".
>>
>> Why not "reload-parameters"?
>
> That came to my mind as well.  Or not wasting time/space on too many
> letters, just "reload-params", perhaps.

The changes for when a reload should take place make plenty of sense,
it is clearly an ongoing source of confusion.  However I'm not so sure
about this part.

Would it not be better to have a single reload operation that took
into account the new config and any changed parameters?  When would we
want to update from only one source of changes?

Splitting the functionality into two functions seems like it would
increase not decrease confusion.  It took even me some time to realise
what you had in mind.

Or is the intention that since most RA writers only think of config
file reload, we are protecting users from incomplete agents? I would
have though requiring the new 'reloadable' to be added to attributes
would be sufficient for this purpose  (call it
'this-attribute-is-reloadable' if you really want to hammer it home
:-).

>
> 
>

Re: [ClusterLabs] HA iSCSITarget Using FileIO

2016-07-13 Thread Jason A Ramsey
Oh, and I forgot to add that when I try to create the LUN with the 
implementation=”lio” parameter, I see the following:

[root@hdc1anas002 ec2-user]# pcs resource create hdcvbnas_lun0 
ocf:heartbeat:iSCSILogicalUnit 
target_iqn="iqn.2016-07.local.hsinawsdev:hdcvadbs-witness" lun="0" 
path=/dev/drbd1 implementation="lio" op monitor interval=15s


[root@hdc1anas002 ec2-user]# pcs status
Cluster name: hdcvbnas
Last updated: Wed Jul 13 16:29:36 2016 Last change: Wed Jul 
13 16:29:33 2016 by root via cibadmin on hdc1anas002
Stack: corosync
Current DC: hdc1anas002 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 7 resources configured

Online: [ hdc1anas002 hdc1bnas002 ]

Full list of resources:

Master/Slave Set: hdcvbnas_tgtclone [hdcvbnas_tgt]
 Masters: [ hdc1anas002 ]
 Slaves: [ hdc1bnas002 ]
hdcvbnas_tgtfs   (ocf::heartbeat:Filesystem):Started hdc1anas002
hdcvbnas_ip0  (ocf::heartbeat:IPaddr2): Started hdc1anas002
hdcvbnas_ip1  (ocf::heartbeat:IPaddr2): Started hdc1bnas002
hdcvbnas_tgtsvc(ocf::heartbeat:iSCSITarget):  Started 
hdc1bnas002
hdcvbnas_lun0   (ocf::heartbeat:iSCSILogicalUnit):  FAILED hdc1anas002 
(unmanaged)

Failed Actions:
* hdcvbnas_lun0_stop_0 on hdc1anas002 'not installed' (5): call=251, 
status=complete, exitreason='Setup problem: couldn't find command: tcm_node',
last-rc-change='Wed Jul 13 16:29:33 2016', queued=0ms, exec=31ms


--

[ jR ]
  M: +1 (703) 628-2621
  @: ja...@eramsey.org

  there is no path to greatness; greatness is the path

From: Jason Ramsey 
Reply-To: Cluster Labs - All topics related to open-source clustering welcomed 

Date: Wednesday, July 13, 2016 at 4:02 PM
To: "users@clusterlabs.org" 
Subject: [ClusterLabs] HA iSCSITarget Using FileIO

I’m having some difficulty setting up a PCS/Corosync HA iSCSI target. I’m able 
to create the iSCSI target resource (it spins up the target service properly 
when the pcs command is issued. However, when I attempt to create a LUN, I get 
nothing but error messages. This works:

pcs resource create hdcvbnas_tgtsvc ocf:heartbeat:iSCSITarget 
iqn=”iqn.2016-07.local.hsinawsdev:hdcvadbs-witness” op monitor interval=15s

But this does not:

pcs resource create hdcvbnas_tgtsvc ocf:heartbeat:iSCSITarget 
iqn=”iqn.2016-07.local.hsinawsdev:hdcvadbs-witness” implementation=”lio” 
portals=”10.0.96.100 10.0.96.101” op monitor interval=15s

Also, when the first (working) command is issued, the creation of the LUN does 
not:

pcs resource create hdcvbnas_lun0 ocf:heartbeat:iSCSILogicalUnit 
target_iqn=”iqn.2016-07.local.hsinawsdev:hdcvadbs-witness” lun=”0” 
path=/dev/drbd1 op monitor interval=15s

Here’s the results:

[root@hdc1anas002 ec2-user]# pcs status
Cluster name: hdcvbnas
Last updated: Wed Jul 13 15:59:05 2016 Last change: Wed Jul 
13 15:59:03 2016 by root via cibadmin on hdc1anas002
Stack: corosync
Current DC: hdc1anas002 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 7 resources configured

Online: [ hdc1anas002 hdc1bnas002 ]

Full list of resources:

Master/Slave Set: hdcvbnas_tgtclone [hdcvbnas_tgt]
 Masters: [ hdc1anas002 ]
 Slaves: [ hdc1bnas002 ]
hdcvbnas_tgtfs   (ocf::heartbeat:Filesystem):Started hdc1anas002
hdcvbnas_ip0  (ocf::heartbeat:IPaddr2): Started hdc1anas002
hdcvbnas_ip1  (ocf::heartbeat:IPaddr2): Started hdc1bnas002
hdcvbnas_tgtsvc(ocf::heartbeat:iSCSITarget):  Started 
hdc1bnas002
hdcvbnas_lun0   (ocf::heartbeat:iSCSILogicalUnit):  Stopped

Failed Actions:
* hdcvbnas_lun0_start_0 on hdc1anas002 'unknown error' (1): call=243, 
status=complete, exitreason='none',
last-rc-change='Wed Jul 13 15:59:03 2016', queued=0ms, exec=123ms
* hdcvbnas_lun0_start_0 on hdc1bnas002 'unknown error' (1): call=257, 
status=complete, exitreason='none',
last-rc-change='Wed Jul 13 15:59:03 2016', queued=0ms, exec=124ms


PCSD Status:
  hdc1anas002: Online
  hdc1bnas002: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled


Unfortunately, neither the corosync nor syslog logs provide any information 
that is remotely helpful in troubleshooting this. I appreciate any help you 
might provide.

--

[ jR ]
  @: ja...@eramsey.org

  there is no path to greatness; greatness is the path
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] HA iSCSITarget Using FileIO

2016-07-13 Thread Jason A Ramsey
I’m having some difficulty setting up a PCS/Corosync HA iSCSI target. I’m able 
to create the iSCSI target resource (it spins up the target service properly 
when the pcs command is issued. However, when I attempt to create a LUN, I get 
nothing but error messages. This works:

pcs resource create hdcvbnas_tgtsvc ocf:heartbeat:iSCSITarget 
iqn=”iqn.2016-07.local.hsinawsdev:hdcvadbs-witness” op monitor interval=15s

But this does not:

pcs resource create hdcvbnas_tgtsvc ocf:heartbeat:iSCSITarget 
iqn=”iqn.2016-07.local.hsinawsdev:hdcvadbs-witness” implementation=”lio” 
portals=”10.0.96.100 10.0.96.101” op monitor interval=15s

Also, when the first (working) command is issued, the creation of the LUN does 
not:

pcs resource create hdcvbnas_lun0 ocf:heartbeat:iSCSILogicalUnit 
target_iqn=”iqn.2016-07.local.hsinawsdev:hdcvadbs-witness” lun=”0” 
path=/dev/drbd1 op monitor interval=15s

Here’s the results:

[root@hdc1anas002 ec2-user]# pcs status
Cluster name: hdcvbnas
Last updated: Wed Jul 13 15:59:05 2016 Last change: Wed Jul 
13 15:59:03 2016 by root via cibadmin on hdc1anas002
Stack: corosync
Current DC: hdc1anas002 (version 1.1.13-10.el7-44eb2dd) - partition with quorum
2 nodes and 7 resources configured

Online: [ hdc1anas002 hdc1bnas002 ]

Full list of resources:

Master/Slave Set: hdcvbnas_tgtclone [hdcvbnas_tgt]
 Masters: [ hdc1anas002 ]
 Slaves: [ hdc1bnas002 ]
hdcvbnas_tgtfs   (ocf::heartbeat:Filesystem):Started hdc1anas002
hdcvbnas_ip0  (ocf::heartbeat:IPaddr2): Started hdc1anas002
hdcvbnas_ip1  (ocf::heartbeat:IPaddr2): Started hdc1bnas002
hdcvbnas_tgtsvc(ocf::heartbeat:iSCSITarget):  Started 
hdc1bnas002
hdcvbnas_lun0   (ocf::heartbeat:iSCSILogicalUnit):  Stopped

Failed Actions:
* hdcvbnas_lun0_start_0 on hdc1anas002 'unknown error' (1): call=243, 
status=complete, exitreason='none',
last-rc-change='Wed Jul 13 15:59:03 2016', queued=0ms, exec=123ms
* hdcvbnas_lun0_start_0 on hdc1bnas002 'unknown error' (1): call=257, 
status=complete, exitreason='none',
last-rc-change='Wed Jul 13 15:59:03 2016', queued=0ms, exec=124ms


PCSD Status:
  hdc1anas002: Online
  hdc1bnas002: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled


Unfortunately, neither the corosync nor syslog logs provide any information 
that is remotely helpful in troubleshooting this. I appreciate any help you 
might provide.

--

[ jR ]
  @: ja...@eramsey.org

  there is no path to greatness; greatness is the path
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Clusvcadm -Z substitute in Pacemaker

2016-07-13 Thread Jan Pokorný
On 13/07/16 12:50 +0200, emmanuel segura wrote:
> using pcs resource unmanage leave the monitoring resource actived, I
> usually set the monitor interval=0 :)

Some time ago, I've filed a bug against pcs for it to perform these
two steps in one go: https://bugzilla.redhat.com/1303969
This slowly becomes a recurring topic.

-- 
Jan (Poki)


pgpvGdACfO7fy.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Clusvcadm -Z substitute in Pacemaker

2016-07-13 Thread Ken Gaillot
On 07/13/2016 09:56 AM, emmanuel segura wrote:
> enabled=false works with every pacemaker versions?

It was introduced in Pacemaker 1.0.2, so realistically, yes :)

> 2016-07-13 16:48 GMT+02:00 Ken Gaillot :
>> On 07/13/2016 05:50 AM, emmanuel segura wrote:
>>> using pcs resource unmanage leave the monitoring resource actived, I
>>> usually set the monitor interval=0 :)
>>
>> Yep :)
>>
>> An easier way is to set "enabled=false" on the monitor, so you don't
>> have to remember what your interval was later. You can set it in the
>> op_defaults section to disable all operations at once (assuming no
>> operation has "enabled=true" explicitly set).
>>
>> Similarly, you can set is_managed=false in rsc_defaults to unmanage all
>> resources (that don't have "is_managed=true" explicitly set).
>>
>>> 2016-07-11 10:43 GMT+02:00 Tomas Jelinek :
 Dne 9.7.2016 v 06:39 jaspal singla napsal(a):
>
> Hello Everyone,
>
> I need little help, if anyone can give some pointers, it would help me a
> lot.
>
> In RHEL-7.x:
>
> There is concept of pacemaker and when I use the below command to freeze
> my resource group operation, it actually stops all of the resources
> associated under the resource group.
>
> # pcs cluster standby 
>
> # pcs cluster unstandby 
>
> Result:  This actually stops all of the resource group in that node
> (ctm_service is one of the resource group, which gets stop including
> database as well, it goes to MOUNT mode)


 Hello Jaspal,

 that's what it's supposed to do. Putting a node into standby means the node
 cannot host any resources.

>
> However; through clusvcadm command on RHEL-6.x, it doesn't stop the
> ctm_service there and my database is in RW mode.
>
> # clusvcadm -Z ctm_service
>
> # clusvcadm -U ctm_service
>
> So my concern here is - Freezing/unfreezing should not affect the status
> of the group. Is there any way around to achieve the same in RHEL-7.x as
> well, that was done with clusvcadm on RHEL 6?


 Maybe you are looking for
 # pcs resource unmanage 
 and
 # pcs resource manage 

 Regards,
 Tomas

>
> Thanks
>
> Jaspal
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Clusvcadm -Z substitute in Pacemaker

2016-07-13 Thread Ken Gaillot
On 07/13/2016 05:50 AM, emmanuel segura wrote:
> using pcs resource unmanage leave the monitoring resource actived, I
> usually set the monitor interval=0 :)

Yep :)

An easier way is to set "enabled=false" on the monitor, so you don't
have to remember what your interval was later. You can set it in the
op_defaults section to disable all operations at once (assuming no
operation has "enabled=true" explicitly set).

Similarly, you can set is_managed=false in rsc_defaults to unmanage all
resources (that don't have "is_managed=true" explicitly set).

> 2016-07-11 10:43 GMT+02:00 Tomas Jelinek :
>> Dne 9.7.2016 v 06:39 jaspal singla napsal(a):
>>>
>>> Hello Everyone,
>>>
>>> I need little help, if anyone can give some pointers, it would help me a
>>> lot.
>>>
>>> In RHEL-7.x:
>>>
>>> There is concept of pacemaker and when I use the below command to freeze
>>> my resource group operation, it actually stops all of the resources
>>> associated under the resource group.
>>>
>>> # pcs cluster standby 
>>>
>>> # pcs cluster unstandby 
>>>
>>> Result:  This actually stops all of the resource group in that node
>>> (ctm_service is one of the resource group, which gets stop including
>>> database as well, it goes to MOUNT mode)
>>
>>
>> Hello Jaspal,
>>
>> that's what it's supposed to do. Putting a node into standby means the node
>> cannot host any resources.
>>
>>>
>>> However; through clusvcadm command on RHEL-6.x, it doesn't stop the
>>> ctm_service there and my database is in RW mode.
>>>
>>> # clusvcadm -Z ctm_service
>>>
>>> # clusvcadm -U ctm_service
>>>
>>> So my concern here is - Freezing/unfreezing should not affect the status
>>> of the group. Is there any way around to achieve the same in RHEL-7.x as
>>> well, that was done with clusvcadm on RHEL 6?
>>
>>
>> Maybe you are looking for
>> # pcs resource unmanage 
>> and
>> # pcs resource manage 
>>
>> Regards,
>> Tomas
>>
>>>
>>> Thanks
>>>
>>> Jaspal

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ocf:heartbeat:apache does not start

2016-07-13 Thread Heiko Reimer


Am 13.07.2016 um 13:17 schrieb Heiko Reimer:


Am 13.07.2016 um 11:09 schrieb Klaus Wenninger:

On 07/13/2016 09:24 AM, Heiko Reimer wrote:

Am 13.07.2016 um 09:09 schrieb Li Junliang:

在 2016-07-13三的 08:59 +0200,Heiko Reimer写道:

Hi,

i try to setup pacemaker apache resource with ocf:heartbeat:apache.
But
when pacemaker try to start the resource i get

Failed Actions:
* apache2_start_0 on node1 'not installed' (5): call=186,
status=complete, exitreason='environment is invalid, resource
considered
stopped',

Here my config:

primitive apache2 apache \
   params configfile="/etc/apache2/apache2.conf" \
   params httpd="/usr/sbin/apache2" \
   params testurl="http://localhost; \
   op monitor interval=10s timeout=20s \
   op start timeout=40s interval=0 \
   op stop timeout=60s interval=0 \
   meta target-role=Started

I am using Debian 8.5 with Apache 2.4.10 and Pacemaker 1.1.14.

Maybe you should check your apache installation on node1. Sometime I
come across these problems , finally I find that apache2 is not in
/usr/sbin or apache2.conf is in another directory rather than
/etc/apache2.

I have checked the path of both. They are ok. With systemd:apache2
everthing works.
Yesterday i had the problem that problem occurred on both nodes (i
have two node setup).

If you check apache_monitor in the RA (of course I don't know your
version of it) you
can see that it creates the $OCF_ERR_INSTALLED as well if it doesn't
find the http-client
used to get the test-url. In the RA-version I have installed the
exitreason would give
more info about that fact like "... could not find http client ..." but
that might be different
with different versions.

It looks like the ra does not create the apache2 directory in /var/run
Which user creates apache2 folder in /var/run? Does pacemaker uses a 
specific user? I think it is root?



Mit freundlichen Grüßen / Best regards
Heiko Reimer



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Bugs: http://bugs.clusterlabs.org

_
Diese Nachricht erhalten Sie im Namen der Sport-Tiedje Gruppe
Sport-Tiedje Head Office:
Sport-Tiedje GmbH
International Headquarters
Flensburger Str. 55
D-24837 Schleswig

Geschaeftsfuehrer / managing directors: Christian Grau, Sebastian
Campmann, Dr. Bernhard Schenkel
Amtsgericht / local court Flensburg: HRB 1000 SL
Steuer-Nr.: 1529319096
UST-ID: DE813211547


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


_
Diese Nachricht erhalten Sie im Namen der Sport-Tiedje Gruppe
Sport-Tiedje Head Office:
Sport-Tiedje GmbH
International Headquarters
Flensburger Str. 55
D-24837 Schleswig

Geschaeftsfuehrer / managing directors: Christian Grau, Sebastian 
Campmann, Dr. Bernhard Schenkel

Amtsgericht / local court Flensburg: HRB 1000 SL
Steuer-Nr.: 1529319096
UST-ID: DE813211547


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


_
Diese Nachricht erhalten Sie im Namen der Sport-Tiedje Gruppe
Sport-Tiedje Head Office:
Sport-Tiedje GmbH
International Headquarters
Flensburger Str. 55
D-24837 Schleswig

Geschaeftsfuehrer / managing directors: Christian Grau, Sebastian Campmann, Dr. 
Bernhard Schenkel
Amtsgericht / local court Flensburg: HRB 1000 SL
Steuer-Nr.: 1529319096
UST-ID: DE813211547


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: Antw: RES: Pacemaker and OCFS2 on stand alone mode

2016-07-13 Thread Ken Gaillot
On 07/13/2016 03:10 AM, Ulrich Windl wrote:
 Ken Gaillot  schrieb am 12.07.2016 um 21:19 in 
 Nachricht
> <578542bf.9010...@redhat.com>:
>> On 07/12/2016 01:16 AM, Ulrich Windl wrote:
> 
> [...]
>>> What I mean is: there is no "success status" for STONITH; it is assumed that
>>> the node will be down after issuing a successful stonith command. You are
>>> claiming your stonith command was not logging any error, so the cluster will
>>> assume STONITH was successful after a timeout.
>>
>> Fence agents do return success/failure; the cluster considers a timeout
>> to be a failure. The only time the cluster assumes a successful fence is
>> when sbd-based watchdog is in use.
> 
> Hi!
> 
> Sorry, but I don't see the difference: If SBD delivers a command 
> successfully, there is no guarantee that the victim node actually executes 
> the command and resets.
> If you use any other fencing command (like submitting some command to an 
> external device) the situation is not different: Successfully submitting the 
> command does not mean the STONITH will succeed in every case (you could even 
> tun off power in the wrong PDU, which is still a "success" from the cluster's 
> perspective)
> [...]
> 
> What I really wanted to say is:
> If the fencing command logged an error, try to fix it; if it did not, try to 
> find out why fencing did not work.
> 
> Regards,
> Ulrich

Yes, I understand your point now, and agree completely.

The cluster can only respond to the status code (or timeout) it receives
from the fence agent. There may be problems beyond that point (in the
fence agent and/or the device itself) that result in success being
returned incorrectly, and that must be investigated separately.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] can't start/stop a drbd resource with pacemaker

2016-07-13 Thread Kristoffer Grönlund
"Lentes, Bernd"  writes:

> Starting or stopping drbd with start/stop does not work neither for the ms 
> ressource nor for the primitive.
> If i try to stop it keeps running. Also if i do a cleanup before (for both 
> resources).
> Which resource should i stop first ? The primitive or the ms ?

You should always act on the container, so the ms in this case (or the
clone, or the group).

> I tried both, but none worked. Other resources, like an ip, can start/stop 
> with crm.
> When i change the target-role of of the primitive via "crm configure edit" 
> and commit that, it starts/stops immediately.
> But that can't be the prefered way to start/stop a drbd resource ?

All crm resource stop  does is set the target-role... so there
is something else going on.

>
> If you need more information ask me.

You can run crm with the -d argument to get more information about what
it does, and -dR to get a full trace of all the commands it executes.

Grepping the logs on the DC node (see crm status output) will probably
get you more hints as well.

Finally, you can run "crm report" or hb_report to collect and analyse
the logs on all your nodes, to get a better overview of what is going on
in the cluster.

But really, I would recommend looking at a tutorial or guide for setting
up DRBD in a pacemaker cluster, as there are multiple steps that have to
be correct for it to work.

Cheers,
Kristoffer

>
> Thanks.
>
>
> Bernd
>
> -- 
> Bernd Lentes 
>
> Systemadministration 
> institute of developmental genetics 
> Gebäude 35.34 - Raum 208 
> HelmholtzZentrum München 
> bernd.len...@helmholtz-muenchen.de 
> phone: +49 (0)89 3187 1241 
> fax: +49 (0)89 3187 2294 
>
> Wer glaubt das Projektleiter Projekte leiten 
> der glaubt auch das Zitronenfalter 
> Zitronen falten
>  
>
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Dr. Alfons Enhsen, Renate 
> Schlusen (komm.)
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] can't start/stop a drbd resource with pacemaker

2016-07-13 Thread Lentes, Bernd
Hi,

i'm new to HA-cluster. I'm currently establishing a two-node cluster and 
playing around with it.
I habe a primitive drdb resource and a corresponding ms one:

crm(live)resource# status

 Master/Slave Set: ms_drbd_r0 [prim_drbd_r0]
 Masters: [ sunhb58820 ]
 Slaves: [ sunhb65278 ]


Starting or stopping drbd with start/stop does not work neither for the ms 
ressource nor for the primitive.
If i try to stop it keeps running. Also if i do a cleanup before (for both 
resources).
Which resource should i stop first ? The primitive or the ms ?
I tried both, but none worked. Other resources, like an ip, can start/stop with 
crm.
When i change the target-role of of the primitive via "crm configure edit" and 
commit that, it starts/stops immediately.
But that can't be the prefered way to start/stop a drbd resource ?

If you need more information ask me.

Thanks.


Bernd

-- 
Bernd Lentes 

Systemadministration 
institute of developmental genetics 
Gebäude 35.34 - Raum 208 
HelmholtzZentrum München 
bernd.len...@helmholtz-muenchen.de 
phone: +49 (0)89 3187 1241 
fax: +49 (0)89 3187 2294 

Wer glaubt das Projektleiter Projekte leiten 
der glaubt auch das Zitronenfalter 
Zitronen falten
 

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Dr. Alfons Enhsen, Renate Schlusen 
(komm.)
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ocf:heartbeat:apache does not start

2016-07-13 Thread Heiko Reimer


Am 13.07.2016 um 11:09 schrieb Klaus Wenninger:

On 07/13/2016 09:24 AM, Heiko Reimer wrote:

Am 13.07.2016 um 09:09 schrieb Li Junliang:

在 2016-07-13三的 08:59 +0200,Heiko Reimer写道:

Hi,

i try to setup pacemaker apache resource with ocf:heartbeat:apache.
But
when pacemaker try to start the resource i get

Failed Actions:
* apache2_start_0 on node1 'not installed' (5): call=186,
status=complete, exitreason='environment is invalid, resource
considered
stopped',

Here my config:

primitive apache2 apache \
   params configfile="/etc/apache2/apache2.conf" \
   params httpd="/usr/sbin/apache2" \
   params testurl="http://localhost; \
   op monitor interval=10s timeout=20s \
   op start timeout=40s interval=0 \
   op stop timeout=60s interval=0 \
   meta target-role=Started

I am using Debian 8.5 with Apache 2.4.10 and Pacemaker 1.1.14.

Maybe you should check your apache installation on node1. Sometime I
come across these problems , finally I find that apache2 is not in
/usr/sbin or apache2.conf is in another directory rather than
/etc/apache2.

I have checked the path of both. They are ok. With systemd:apache2
everthing works.
Yesterday i had the problem that problem occurred on both nodes (i
have two node setup).

If you check apache_monitor in the RA (of course I don't know your
version of it) you
can see that it creates the $OCF_ERR_INSTALLED as well if it doesn't
find the http-client
used to get the test-url. In the RA-version I have installed the
exitreason would give
more info about that fact like "... could not find http client ..." but
that might be different
with different versions.

It looks like the ra does not create the apache2 directory in /var/run



Mit freundlichen Grüßen / Best regards
Heiko Reimer



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_
Diese Nachricht erhalten Sie im Namen der Sport-Tiedje Gruppe
Sport-Tiedje Head Office:
Sport-Tiedje GmbH
International Headquarters
Flensburger Str. 55
D-24837 Schleswig

Geschaeftsfuehrer / managing directors: Christian Grau, Sebastian
Campmann, Dr. Bernhard Schenkel
Amtsgericht / local court Flensburg: HRB 1000 SL
Steuer-Nr.: 1529319096
UST-ID: DE813211547


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


_
Diese Nachricht erhalten Sie im Namen der Sport-Tiedje Gruppe
Sport-Tiedje Head Office:
Sport-Tiedje GmbH
International Headquarters
Flensburger Str. 55
D-24837 Schleswig

Geschaeftsfuehrer / managing directors: Christian Grau, Sebastian Campmann, Dr. 
Bernhard Schenkel
Amtsgericht / local court Flensburg: HRB 1000 SL
Steuer-Nr.: 1529319096
UST-ID: DE813211547


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Clusvcadm -Z substitute in Pacemaker

2016-07-13 Thread emmanuel segura
using pcs resource unmanage leave the monitoring resource actived, I
usually set the monitor interval=0 :)

2016-07-11 10:43 GMT+02:00 Tomas Jelinek :
> Dne 9.7.2016 v 06:39 jaspal singla napsal(a):
>>
>> Hello Everyone,
>>
>> I need little help, if anyone can give some pointers, it would help me a
>> lot.
>>
>> In RHEL-7.x:
>>
>> There is concept of pacemaker and when I use the below command to freeze
>> my resource group operation, it actually stops all of the resources
>> associated under the resource group.
>>
>> # pcs cluster standby 
>>
>> # pcs cluster unstandby 
>>
>> Result:  This actually stops all of the resource group in that node
>> (ctm_service is one of the resource group, which gets stop including
>> database as well, it goes to MOUNT mode)
>
>
> Hello Jaspal,
>
> that's what it's supposed to do. Putting a node into standby means the node
> cannot host any resources.
>
>>
>> However; through clusvcadm command on RHEL-6.x, it doesn't stop the
>> ctm_service there and my database is in RW mode.
>>
>> # clusvcadm -Z ctm_service
>>
>> # clusvcadm -U ctm_service
>>
>> So my concern here is - Freezing/unfreezing should not affect the status
>> of the group. Is there any way around to achieve the same in RHEL-7.x as
>> well, that was done with clusvcadm on RHEL 6?
>
>
> Maybe you are looking for
> # pcs resource unmanage 
> and
> # pcs resource manage 
>
> Regards,
> Tomas
>
>>
>> Thanks
>>
>> Jaspal
>>
>>
>>
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ocf:heartbeat:apache does not start

2016-07-13 Thread Klaus Wenninger
On 07/13/2016 09:24 AM, Heiko Reimer wrote:
>
> Am 13.07.2016 um 09:09 schrieb Li Junliang:
>> 在 2016-07-13三的 08:59 +0200,Heiko Reimer写道:
>>> Hi,
>>>
>>> i try to setup pacemaker apache resource with ocf:heartbeat:apache.
>>> But
>>> when pacemaker try to start the resource i get
>>>
>>> Failed Actions:
>>> * apache2_start_0 on node1 'not installed' (5): call=186,
>>> status=complete, exitreason='environment is invalid, resource
>>> considered
>>> stopped',
>>>
>>> Here my config:
>>>
>>> primitive apache2 apache \
>>>   params configfile="/etc/apache2/apache2.conf" \
>>>   params httpd="/usr/sbin/apache2" \
>>>   params testurl="http://localhost; \
>>>   op monitor interval=10s timeout=20s \
>>>   op start timeout=40s interval=0 \
>>>   op stop timeout=60s interval=0 \
>>>   meta target-role=Started
>>>
>>> I am using Debian 8.5 with Apache 2.4.10 and Pacemaker 1.1.14.
>> Maybe you should check your apache installation on node1. Sometime I
>> come across these problems , finally I find that apache2 is not in
>> /usr/sbin or apache2.conf is in another directory rather than
>> /etc/apache2.
> I have checked the path of both. They are ok. With systemd:apache2
> everthing works.
> Yesterday i had the problem that problem occurred on both nodes (i
> have two node setup).

If you check apache_monitor in the RA (of course I don't know your
version of it) you
can see that it creates the $OCF_ERR_INSTALLED as well if it doesn't
find the http-client
used to get the test-url. In the RA-version I have installed the
exitreason would give
more info about that fact like "... could not find http client ..." but
that might be different
with different versions.

>>> Mit freundlichen Grüßen / Best regards
>>>Heiko Reimer
>>>
>>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> _
> Diese Nachricht erhalten Sie im Namen der Sport-Tiedje Gruppe
> Sport-Tiedje Head Office:
> Sport-Tiedje GmbH
> International Headquarters
> Flensburger Str. 55
> D-24837 Schleswig
>
> Geschaeftsfuehrer / managing directors: Christian Grau, Sebastian
> Campmann, Dr. Bernhard Schenkel
> Amtsgericht / local court Flensburg: HRB 1000 SL
> Steuer-Nr.: 1529319096
> UST-ID: DE813211547
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ocf:heartbeat:apache does not start

2016-07-13 Thread Heiko Reimer


Am 13.07.2016 um 09:09 schrieb Li Junliang:

在 2016-07-13三的 08:59 +0200,Heiko Reimer写道:

Hi,

i try to setup pacemaker apache resource with ocf:heartbeat:apache.
But
when pacemaker try to start the resource i get

Failed Actions:
* apache2_start_0 on node1 'not installed' (5): call=186,
status=complete, exitreason='environment is invalid, resource
considered
stopped',

Here my config:

primitive apache2 apache \
  params configfile="/etc/apache2/apache2.conf" \
  params httpd="/usr/sbin/apache2" \
  params testurl="http://localhost; \
  op monitor interval=10s timeout=20s \
  op start timeout=40s interval=0 \
  op stop timeout=60s interval=0 \
  meta target-role=Started

I am using Debian 8.5 with Apache 2.4.10 and Pacemaker 1.1.14.

Maybe you should check your apache installation on node1. Sometime I
come across these problems , finally I find that apache2 is not in
/usr/sbin or apache2.conf is in another directory rather than
/etc/apache2.
I have checked the path of both. They are ok. With systemd:apache2 
everthing works.
Yesterday i had the problem that problem occurred on both nodes (i have 
two node setup).

Mit freundlichen Grüßen / Best regards
   
Heiko Reimer




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


_
Diese Nachricht erhalten Sie im Namen der Sport-Tiedje Gruppe
Sport-Tiedje Head Office:
Sport-Tiedje GmbH
International Headquarters
Flensburger Str. 55
D-24837 Schleswig

Geschaeftsfuehrer / managing directors: Christian Grau, Sebastian Campmann, Dr. 
Bernhard Schenkel
Amtsgericht / local court Flensburg: HRB 1000 SL
Steuer-Nr.: 1529319096
UST-ID: DE813211547


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org