[ClusterLabs] Antw: emergency stop does not honor resources ordering constraints (?)

2016-12-06 Thread Ulrich Windl
>>> Radoslaw Garbacz  schrieb am 06.12.2016 
>>> um
18:50 in Nachricht
:
> Hi,
> 
> I have encountered a problem with pacemaker resources shutdown in case of
> (seems like) any emergency situation, when order constraints are not
> honored.
> I would be grateful for any information, whether this behavior is
> intentional or should not happen (i.e. some testing issue rather then
> pacemaker behavior). It would also be helpful to know if there is any
> configuration parameter altering this, or whether there can be any reason
> (cluster event) triggering not ordered resources stop.
> 
> Thanks,
> 
> To illustrate the issue I provide an example below and my collected data.
> My environment uses resources cloning feature - maybe this contributes to
> my tests outcome.
> 
> 
> * Example:
> - having resources ordered with constraints: A -> B -> C
> - when stopping with 'crm_resources' command (all at once) resources are
> stopped: C, B, A
> - when stopping by terminating pacemaker resources are stopped: C, B, A
> - when there is a monitoring error or quorum lost: no order is honored e.g.
> B, C, A

Hi!

If the node does not have quorum, it cannot do any cluster operations (IMHO). 
Instead it will try to commit suicide, maby with the help of self-fencing. So I 
think this case is normal for no quorum.

Ulrich

> 
> 
> 
> * Version details:
> Pacemaker 1.1.15-1.1f8e642.git.el6
> Corosync Cluster Engine, version '2.4.1.2-0da1'
> 
> 
> 
> * My ordering constraints:
> Ordering Constraints:
>   dbx_first_primary then dbx_head_head (kind:Mandatory)
>   dbx_first_primary-clone then dbx_head_head (kind:Mandatory)
>   dbx_head_head then dbx_mounts_nodes (kind:Mandatory)
>   dbx_head_head then dbx_mounts_nodes-clone (kind:Mandatory)
>   dbx_mounts_nodes then dbx_bind_mounts_nodes (kind:Mandatory)
>   dbx_mounts_nodes-clone then dbx_bind_mounts_nodes-clone (kind:Mandatory)
>   dbx_bind_mounts_nodes then dbx_nfs_nodes (kind:Mandatory)
>   dbx_bind_mounts_nodes-clone then dbx_nfs_nodes-clone (kind:Mandatory)
>   dbx_nfs_nodes then dbx_gss_datas (kind:Mandatory)
>   dbx_nfs_nodes-clone then dbx_gss_datas-clone (kind:Mandatory)
>   dbx_gss_datas then dbx_nfs_mounts_datas (kind:Mandatory)
>   dbx_gss_datas-clone then dbx_nfs_mounts_datas-clone (kind:Mandatory)
>   dbx_nfs_mounts_datas then dbx_swap_nodes (kind:Mandatory)
>   dbx_nfs_mounts_datas-clone then dbx_swap_nodes-clone (kind:Mandatory)
>   dbx_swap_nodes then dbx_sync_head (kind:Mandatory)
>   dbx_swap_nodes-clone then dbx_sync_head (kind:Mandatory)
>   dbx_sync_head then dbx_dbx_datas (kind:Mandatory)
>   dbx_sync_head then dbx_dbx_datas-clone (kind:Mandatory)
>   dbx_dbx_datas then dbx_dbx_head (kind:Mandatory)
>   dbx_dbx_datas-clone then dbx_dbx_head (kind:Mandatory)
>   dbx_dbx_head then dbx_web_head (kind:Mandatory)
>   dbx_web_head then dbx_ready_primary (kind:Mandatory)
>   dbx_web_head then dbx_ready_primary-clone (kind:Mandatory)
> 
> 
> 
> * Pacemaker stop (OK):
> ready.ocf.sh(dbx_ready_primary)[18639]: 2016/12/06_15:40:32 INFO:
> ready_stop: Stopping resource
> mng.ocf.sh(dbx_mng_head)[20312]:2016/12/06_15:40:44 INFO: mng_stop:
> Stopping resource
> web.ocf.sh(dbx_web_head)[20310]:2016/12/06_15:40:44 INFO:
> dbxcl_stop: Stopping resource
> dbx.ocf.sh(dbx_dbx_head)[20569]:2016/12/06_15:40:46 INFO:
> dbxcl_stop: Stopping resource
> sync.ocf.sh(dbx_sync_head)[20719]:  2016/12/06_15:40:54 INFO:
> sync_stop: Stopping resource
> swap.ocf.sh(dbx_swap_nodes)[21053]: 2016/12/06_15:40:56 INFO:
> swap_stop: Stopping resource
> nfs.ocf.sh(dbx_nfs_nodes)[21151]:   2016/12/06_15:40:58 INFO: nfs_stop:
> Stopping resource
> dbx_mounts.ocf.sh(dbx_bind_mounts_nodes)[21344]:2016/12/06_15:40:59
> INFO: dbx_mounts_stop: Stopping resource
> dbx_mounts.ocf.sh(dbx_mounts_nodes)[21767]: 2016/12/06_15:41:01 INFO:
> dbx_mounts_stop: Stopping resource
> head.ocf.sh(dbx_head_head)[22213]:  2016/12/06_15:41:04 INFO:
> head_stop: Stopping resource
> first.ocf.sh(dbx_first_primary)[22999]: 2016/12/06_15:41:11 INFO:
> first_stop: Stopping resource
> 
> 
> 
> * Quorum lost:
> sync.ocf.sh(dbx_sync_head)[23099]:  2016/12/06_16:42:04 INFO:
> sync_stop: Stopping resource
> nfs.ocf.sh(dbx_nfs_nodes)[23102]:   2016/12/06_16:42:04 INFO: nfs_stop:
> Stopping resource
> mng.ocf.sh(dbx_mng_head)[23101]:2016/12/06_16:42:04 INFO: mng_stop:
> Stopping resource
> ready.ocf.sh(dbx_ready_primary)[23104]: 2016/12/06_16:42:04 INFO:
> ready_stop: Stopping resource
> web.ocf.sh(dbx_web_head)[23344]:2016/12/06_16:42:04 INFO:
> dbxcl_stop: Stopping resource
> dbx_mounts.ocf.sh(dbx_bind_mounts_nodes)[23664]:2016/12/06_16:42:05
> INFO: dbx_mounts_stop: Stopping resource
> dbx_mounts.ocf.sh(dbx_mounts_nodes)[24459]: 2016/12/06_16:42:08 INFO:
> dbx_mounts_stop: Stopping resource
> head.ocf.sh(dbx_head_head)[25036]:  2016/12/06_16:42:11 INFO:
> head_stop: Stopping resource
> swap.ocf.sh(dbx_swap_nodes)[27491]: 2016/12/06_16:43

[ClusterLabs] Antw: Re: Error performing operation: Argument list too long

2016-12-06 Thread Ulrich Windl
>>> Ken Gaillot  schrieb am 06.12.2016 um 16:44 in
Nachricht
<58329eaf-cbe0-55e0-7648-849879f1b...@redhat.com>:
> On 12/05/2016 02:29 PM, Shane Lawrence wrote:
>> I'm experiencing a strange issue with pacemaker. It is unable to check
>> the status of a systemd resource.
>> 
>> systemctl shows that the service crashed:
>> [root@xx ~]# systemctl status rsyslog
>> ● rsyslog.service - System Logging Service
>>Loaded: loaded (/usr/lib/systemd/system/rsyslog.service; enabled;
>> vendor preset: enabled)
>>Active: inactive (dead) since Mon 2016-12-05 07:41:11 UTC; 12h ago
>>  Docs: man:rsyslogd(8)
>>http://www.rsyslog.com/doc/ 
>>  Main PID: 22703 (code=exited, status=0/SUCCESS)
>> 
>> Dec 02 21:41:41 xx...xx systemd[1]: Starting Cluster
>> Controlled rsyslog...
>> Dec 02 21:41:41 xx...xx systemd[1]: Started Cluster
>> Controlled rsyslog.
>> Dec 05 07:41:08 xx...xx systemd[1]: Stopping System
>> Logging Service...
>> Dec 05 07:41:11 xx...xx systemd[1]: Stopped System
>> Logging Service.
>> Dec 05 07:41:40 xx...xx systemd[1]: Stopped System
>> Logging Service.
>> 
>> Attempting to view the status through Pacemaker shows:
>> [root@xx ~]# crm_resource --force-check -V -r rsyslog
>> Error performing operation: Argument list too long
>> [root@xx ~]# pcs resource debug-monitor rsyslog --full
>> Error performing operation: Argument list too long
>> 
>> The problem seems to be resolved (temporarily) by restarting corosync
>> and then starting the cluster again.
>> 
>> Has anyone else experienced this?
> 
> That is odd behavior. You may want to open a bug report at
> bugs.clusterlabs.org and attach your configuration and logs.
> 
> On Linux, the system error number for "Argument list too long" is the
> same as the OCF monitor status "Not running", so I suspect that it's a
> display issue rather than an actual error, but I'm not sure.

If it's strerror() with the wrong type of argument, it's rather a programming
error than a display error ;-)

> 
> Then the question would just be why is rsyslog stopping.
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Error performing operation: Argument list too long

2016-12-06 Thread Andrei Borzenkov
06.12.2016 20:41, Jan Pokorný пишет:
> On 06/12/16 09:44 -0600, Ken Gaillot wrote:
>> On 12/05/2016 02:29 PM, Shane Lawrence wrote:
>>> I'm experiencing a strange issue with pacemaker. It is unable to check
>>> the status of a systemd resource.
>>>
>>> systemctl shows that the service crashed:

No, it shows that service process exited gracefully without errors.
There is no indication of "crash" in output you posted.

>>> [root@xx ~]# systemctl status rsyslog
>>> ● rsyslog.service - System Logging Service
>>>Loaded: loaded (/usr/lib/systemd/system/rsyslog.service; enabled;
>>> vendor preset: enabled)
>>>Active: inactive (dead) since Mon 2016-12-05 07:41:11 UTC; 12h ago
>>>  Docs: man:rsyslogd(8)
>>>http://www.rsyslog.com/doc/
>>>  Main PID: 22703 (code=exited, status=0/SUCCESS)
>>>
>>> Dec 02 21:41:41 xx...xx systemd[1]: Starting Cluster
>>> Controlled rsyslog...
>>> Dec 02 21:41:41 xx...xx systemd[1]: Started Cluster
>>> Controlled rsyslog.
>>> Dec 05 07:41:08 xx...xx systemd[1]: Stopping System
>>> Logging Service...
>>> Dec 05 07:41:11 xx...xx systemd[1]: Stopped System
>>> Logging Service.
>>> Dec 05 07:41:40 xx...xx systemd[1]: Stopped System
>>> Logging Service.
>>>
>>> Attempting to view the status through Pacemaker shows:
>>> [root@xx ~]# crm_resource --force-check -V -r rsyslog
>>> Error performing operation: Argument list too long
>>> [root@xx ~]# pcs resource debug-monitor rsyslog --full
>>> Error performing operation: Argument list too long
>>>
>>> The problem seems to be resolved (temporarily) by restarting corosync
>>> and then starting the cluster again.
>>>
>>> Has anyone else experienced this?
>>
>> That is odd behavior. You may want to open a bug report at
>> bugs.clusterlabs.org and attach your configuration and logs.
>>
>> On Linux, the system error number for "Argument list too long" is the
>> same as the OCF monitor status "Not running", so I suspect that it's a
>> display issue rather than an actual error, but I'm not sure.
>>
>> Then the question would just be why is rsyslog stopping.
> 
> Even more that "Cluster Controlled rsyslog has been started" while
> "System Logging Service" is being stopped.  Could it be a result
> of a namespace/daemon/service clash of some kind?
> 

systemctl status does simplistic match by unit name, rsyslog.service in
this case, to show logs. It is entirely possible that between unit start
and unit stop its definition was changed and systemd reloaded (which
often happens implicitly during packages install at the very least).
This would explain such behavior.

дек 07 06:45:59 bor-Latitude-E5450 systemd[1]: Starting Before reload foo...
дек 07 06:46:09 bor-Latitude-E5450 systemd[1]: Stopped Before reload foo.
дек 07 06:46:35 bor-Latitude-E5450 systemd[1]: Started Before reload foo.

Edit Description of unit and do systemctl daemon-reload

дек 07 06:47:01 bor-Latitude-E5450 systemd[1]: Stopping After reload foo...
дек 07 06:47:01 bor-Latitude-E5450 systemd[1]: Stopped After reload foo.




signature.asc
Description: OpenPGP digital signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] emergency stop does not honor resources ordering constraints (?)

2016-12-06 Thread Radoslaw Garbacz
Hi,

I have encountered a problem with pacemaker resources shutdown in case of
(seems like) any emergency situation, when order constraints are not
honored.
I would be grateful for any information, whether this behavior is
intentional or should not happen (i.e. some testing issue rather then
pacemaker behavior). It would also be helpful to know if there is any
configuration parameter altering this, or whether there can be any reason
(cluster event) triggering not ordered resources stop.

Thanks,

To illustrate the issue I provide an example below and my collected data.
My environment uses resources cloning feature - maybe this contributes to
my tests outcome.


* Example:
- having resources ordered with constraints: A -> B -> C
- when stopping with 'crm_resources' command (all at once) resources are
stopped: C, B, A
- when stopping by terminating pacemaker resources are stopped: C, B, A
- when there is a monitoring error or quorum lost: no order is honored e.g.
B, C, A



* Version details:
Pacemaker 1.1.15-1.1f8e642.git.el6
Corosync Cluster Engine, version '2.4.1.2-0da1'



* My ordering constraints:
Ordering Constraints:
  dbx_first_primary then dbx_head_head (kind:Mandatory)
  dbx_first_primary-clone then dbx_head_head (kind:Mandatory)
  dbx_head_head then dbx_mounts_nodes (kind:Mandatory)
  dbx_head_head then dbx_mounts_nodes-clone (kind:Mandatory)
  dbx_mounts_nodes then dbx_bind_mounts_nodes (kind:Mandatory)
  dbx_mounts_nodes-clone then dbx_bind_mounts_nodes-clone (kind:Mandatory)
  dbx_bind_mounts_nodes then dbx_nfs_nodes (kind:Mandatory)
  dbx_bind_mounts_nodes-clone then dbx_nfs_nodes-clone (kind:Mandatory)
  dbx_nfs_nodes then dbx_gss_datas (kind:Mandatory)
  dbx_nfs_nodes-clone then dbx_gss_datas-clone (kind:Mandatory)
  dbx_gss_datas then dbx_nfs_mounts_datas (kind:Mandatory)
  dbx_gss_datas-clone then dbx_nfs_mounts_datas-clone (kind:Mandatory)
  dbx_nfs_mounts_datas then dbx_swap_nodes (kind:Mandatory)
  dbx_nfs_mounts_datas-clone then dbx_swap_nodes-clone (kind:Mandatory)
  dbx_swap_nodes then dbx_sync_head (kind:Mandatory)
  dbx_swap_nodes-clone then dbx_sync_head (kind:Mandatory)
  dbx_sync_head then dbx_dbx_datas (kind:Mandatory)
  dbx_sync_head then dbx_dbx_datas-clone (kind:Mandatory)
  dbx_dbx_datas then dbx_dbx_head (kind:Mandatory)
  dbx_dbx_datas-clone then dbx_dbx_head (kind:Mandatory)
  dbx_dbx_head then dbx_web_head (kind:Mandatory)
  dbx_web_head then dbx_ready_primary (kind:Mandatory)
  dbx_web_head then dbx_ready_primary-clone (kind:Mandatory)



* Pacemaker stop (OK):
ready.ocf.sh(dbx_ready_primary)[18639]: 2016/12/06_15:40:32 INFO:
ready_stop: Stopping resource
mng.ocf.sh(dbx_mng_head)[20312]:2016/12/06_15:40:44 INFO: mng_stop:
Stopping resource
web.ocf.sh(dbx_web_head)[20310]:2016/12/06_15:40:44 INFO:
dbxcl_stop: Stopping resource
dbx.ocf.sh(dbx_dbx_head)[20569]:2016/12/06_15:40:46 INFO:
dbxcl_stop: Stopping resource
sync.ocf.sh(dbx_sync_head)[20719]:  2016/12/06_15:40:54 INFO:
sync_stop: Stopping resource
swap.ocf.sh(dbx_swap_nodes)[21053]: 2016/12/06_15:40:56 INFO:
swap_stop: Stopping resource
nfs.ocf.sh(dbx_nfs_nodes)[21151]:   2016/12/06_15:40:58 INFO: nfs_stop:
Stopping resource
dbx_mounts.ocf.sh(dbx_bind_mounts_nodes)[21344]:2016/12/06_15:40:59
INFO: dbx_mounts_stop: Stopping resource
dbx_mounts.ocf.sh(dbx_mounts_nodes)[21767]: 2016/12/06_15:41:01 INFO:
dbx_mounts_stop: Stopping resource
head.ocf.sh(dbx_head_head)[22213]:  2016/12/06_15:41:04 INFO:
head_stop: Stopping resource
first.ocf.sh(dbx_first_primary)[22999]: 2016/12/06_15:41:11 INFO:
first_stop: Stopping resource



* Quorum lost:
sync.ocf.sh(dbx_sync_head)[23099]:  2016/12/06_16:42:04 INFO:
sync_stop: Stopping resource
nfs.ocf.sh(dbx_nfs_nodes)[23102]:   2016/12/06_16:42:04 INFO: nfs_stop:
Stopping resource
mng.ocf.sh(dbx_mng_head)[23101]:2016/12/06_16:42:04 INFO: mng_stop:
Stopping resource
ready.ocf.sh(dbx_ready_primary)[23104]: 2016/12/06_16:42:04 INFO:
ready_stop: Stopping resource
web.ocf.sh(dbx_web_head)[23344]:2016/12/06_16:42:04 INFO:
dbxcl_stop: Stopping resource
dbx_mounts.ocf.sh(dbx_bind_mounts_nodes)[23664]:2016/12/06_16:42:05
INFO: dbx_mounts_stop: Stopping resource
dbx_mounts.ocf.sh(dbx_mounts_nodes)[24459]: 2016/12/06_16:42:08 INFO:
dbx_mounts_stop: Stopping resource
head.ocf.sh(dbx_head_head)[25036]:  2016/12/06_16:42:11 INFO:
head_stop: Stopping resource
swap.ocf.sh(dbx_swap_nodes)[27491]: 2016/12/06_16:43:08 INFO:
swap_stop: Stopping resource


-- 
Best Regards,

Radoslaw Garbacz
XtremeData Incorporation
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Error performing operation: Argument list too long

2016-12-06 Thread Jan Pokorný
On 06/12/16 09:44 -0600, Ken Gaillot wrote:
> On 12/05/2016 02:29 PM, Shane Lawrence wrote:
>> I'm experiencing a strange issue with pacemaker. It is unable to check
>> the status of a systemd resource.
>> 
>> systemctl shows that the service crashed:
>> [root@xx ~]# systemctl status rsyslog
>> ● rsyslog.service - System Logging Service
>>Loaded: loaded (/usr/lib/systemd/system/rsyslog.service; enabled;
>> vendor preset: enabled)
>>Active: inactive (dead) since Mon 2016-12-05 07:41:11 UTC; 12h ago
>>  Docs: man:rsyslogd(8)
>>http://www.rsyslog.com/doc/
>>  Main PID: 22703 (code=exited, status=0/SUCCESS)
>> 
>> Dec 02 21:41:41 xx...xx systemd[1]: Starting Cluster
>> Controlled rsyslog...
>> Dec 02 21:41:41 xx...xx systemd[1]: Started Cluster
>> Controlled rsyslog.
>> Dec 05 07:41:08 xx...xx systemd[1]: Stopping System
>> Logging Service...
>> Dec 05 07:41:11 xx...xx systemd[1]: Stopped System
>> Logging Service.
>> Dec 05 07:41:40 xx...xx systemd[1]: Stopped System
>> Logging Service.
>> 
>> Attempting to view the status through Pacemaker shows:
>> [root@xx ~]# crm_resource --force-check -V -r rsyslog
>> Error performing operation: Argument list too long
>> [root@xx ~]# pcs resource debug-monitor rsyslog --full
>> Error performing operation: Argument list too long
>> 
>> The problem seems to be resolved (temporarily) by restarting corosync
>> and then starting the cluster again.
>> 
>> Has anyone else experienced this?
> 
> That is odd behavior. You may want to open a bug report at
> bugs.clusterlabs.org and attach your configuration and logs.
> 
> On Linux, the system error number for "Argument list too long" is the
> same as the OCF monitor status "Not running", so I suspect that it's a
> display issue rather than an actual error, but I'm not sure.
> 
> Then the question would just be why is rsyslog stopping.

Even more that "Cluster Controlled rsyslog has been started" while
"System Logging Service" is being stopped.  Could it be a result
of a namespace/daemon/service clash of some kind?

-- 
Jan (Poki)


pgpWpae_0qyG9.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Error performing operation: Argument list too long

2016-12-06 Thread Ken Gaillot
On 12/05/2016 02:29 PM, Shane Lawrence wrote:
> I'm experiencing a strange issue with pacemaker. It is unable to check
> the status of a systemd resource.
> 
> systemctl shows that the service crashed:
> [root@xx ~]# systemctl status rsyslog
> ● rsyslog.service - System Logging Service
>Loaded: loaded (/usr/lib/systemd/system/rsyslog.service; enabled;
> vendor preset: enabled)
>Active: inactive (dead) since Mon 2016-12-05 07:41:11 UTC; 12h ago
>  Docs: man:rsyslogd(8)
>http://www.rsyslog.com/doc/
>  Main PID: 22703 (code=exited, status=0/SUCCESS)
> 
> Dec 02 21:41:41 xx...xx systemd[1]: Starting Cluster
> Controlled rsyslog...
> Dec 02 21:41:41 xx...xx systemd[1]: Started Cluster
> Controlled rsyslog.
> Dec 05 07:41:08 xx...xx systemd[1]: Stopping System
> Logging Service...
> Dec 05 07:41:11 xx...xx systemd[1]: Stopped System
> Logging Service.
> Dec 05 07:41:40 xx...xx systemd[1]: Stopped System
> Logging Service.
> 
> Attempting to view the status through Pacemaker shows:
> [root@xx ~]# crm_resource --force-check -V -r rsyslog
> Error performing operation: Argument list too long
> [root@xx ~]# pcs resource debug-monitor rsyslog --full
> Error performing operation: Argument list too long
> 
> The problem seems to be resolved (temporarily) by restarting corosync
> and then starting the cluster again.
> 
> Has anyone else experienced this?

That is odd behavior. You may want to open a bug report at
bugs.clusterlabs.org and attach your configuration and logs.

On Linux, the system error number for "Argument list too long" is the
same as the OCF monitor status "Not running", so I suspect that it's a
display issue rather than an actual error, but I'm not sure.

Then the question would just be why is rsyslog stopping.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: [pacemaker+ clvm] Cluster lvm must be active exclusively to create snapshot

2016-12-06 Thread su liu
Yes! I have tried it and succeeded.


Clustered LVM can take snapshot only on exclusively lock mode.



2016-12-06 16:23 GMT+08:00 Ulrich Windl :

> >>> su liu  schrieb am 06.12.2016 um 02:16 in
> Nachricht
> :
> > *Hi all,*
> >
> >
> > *I am new to pacemaker and I have some questions about the clvmd +
> > pacemaker + corosync. I wish you could explain it for me if you are free.
> > thank you very much!*
> > *I have 2 nodes and the pacemaker's status is as follows:*
> >
> > [root@controller ~]# pcs status --full
> > Cluster name: mycluster
> > Last updated: Mon Dec  5 18:15:12 2016Last change: Fri Dec  2
> > 15:01:03 2016 by root via cibadmin on compute1
> > Stack: corosync
> > Current DC: compute1 (2) (version 1.1.13-10.el7_2.4-44eb2dd) - partition
> > with quorum
> > 2 nodes and 4 resources configured
> >
> > Online: [ compute1 (2) controller (1) ]
> >
> > Full list of resources:
> >
> >  Clone Set: dlm-clone [dlm]
> >  dlm(ocf::pacemaker:controld):Started compute1
> >  dlm(ocf::pacemaker:controld):Started controller
> >  Started: [ compute1 controller ]
> >  Clone Set: clvmd-clone [clvmd]
> >  clvmd(ocf::heartbeat:clvm):Started compute1
> >  clvmd(ocf::heartbeat:clvm):Started controller
> >  Started: [ compute1 controller ]
> >
> > Node Attributes:
> > * Node compute1 (2):
> > * Node controller (1):
> >
> > Migration Summary:
> > * Node compute1 (2):
> > * Node controller (1):
> >
> > PCSD Status:
> >   controller: Online
> >   compute1: Online
> >
> > Daemon Status:
> >   corosync: active/disabled
> >   pacemaker: active/disabled
> >   pcsd: active/enabled
> >
> >
> >
> > *I create a lvm on controller node and it can be seen on the compute1
> > node immediately with 'lvs' command. but the lvm it not activate on
> > compute1.*
> > *then i want to create a snapshot of the lvm, but failed with the error
> > message:*
> >
> >
> >
> > *### volume-4fad87bb-3d4c-4a96-bef1-8799980050d1 must be active
> exclusively
> > to create snapshot ###*
> > *Can someone tell me how to snapshot a lvm in the cluster lvm
> environment?
> > thank you very much。*
>
> Did you try "vgchange -a e ..."?
>
> >
> >
> > Additional information:
> >
> > [root@controller ~]# vgdisplay
> >   --- Volume group ---
> >   VG Name   cinder-volumes
> >   System ID
> >   Formatlvm2
> >   Metadata Areas1
> >   Metadata Sequence No  19
> >   VG Access read/write
> >   VG Status resizable
> >   Clustered yes
> >   Sharedno
> >   MAX LV0
> >   Cur LV1
> >   Open LV   0
> >   Max PV0
> >   Cur PV1
> >   Act PV1
> >   VG Size   1000.00 GiB
> >   PE Size   4.00 MiB
> >   Total PE  255999
> >   Alloc PE / Size   256 / 1.00 GiB
> >   Free  PE / Size   255743 / 999.00 GiB
> >   VG UUID   aLamHi-mMcI-2NsC-Spjm-QWZr-MzHx-pPYSTt
> >
> > [root@controller ~]# rpm -qa |grep pacem
> > pacemaker-cli-1.1.13-10.el7_2.4.x86_64
> > pacemaker-libs-1.1.13-10.el7_2.4.x86_64
> > pacemaker-1.1.13-10.el7_2.4.x86_64
> > pacemaker-cluster-libs-1.1.13-10.el7_2.4.x86_64
> >
> >
> > [root@controller ~]# lvs
> >   LV  VG Attr
> > LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
> >   volume-1b0ea468-37c8-4b47-a6fa-6cce65b068b5 cinder-volumes -wi-a-
> > 1.00g
> >
> >
> > [root@compute1 ~]# lvs
> >   LV  VG Attr
> > LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
> >   volume-1b0ea468-37c8-4b47-a6fa-6cce65b068b5 cinder-volumes -wi---
> > 1.00g
> >
> >
> > thank you very much!
>
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] How to DRBD + Pacemaker + Samba in Active/Passive Cluster?

2016-12-06 Thread Stanislav Kopp
Hi,

like Ulrich already said, you don't need CTDB for "active/passive"
Samba, we use standard LSB RA for Samba init script without problems
since many years.

Best,
Stan

2016-12-05 10:16 GMT+01:00 Semion Itic :
> Hello Everybody,
>
> How to DRBD + Pacemaker + Samba  in Active/Passive Cluster?
>
> I have been searching now about many days how to integrate drbd + pacemaker
> and corosync in a two node active/passive cluster (with Service IP)  with
> SAMBA. And I still don’t unterstand how to to go further after mounting the
> filesystem, I want to integrate Samba in the pacemaker process as a service.
> I saw that the main solution to this, is using CTDB, but this seems to be
> very complex for me. So, do anybody have experience in this combination of
> topics and can provide me with a Instruction or at least an advice.
>
> Regards,
> Simon I
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: [pacemaker+ clvm] Cluster lvm must be active exclusively to create snapshot

2016-12-06 Thread Ulrich Windl
>>> su liu  schrieb am 06.12.2016 um 02:16 in Nachricht
:
> *Hi all,*
> 
> 
> *I am new to pacemaker and I have some questions about the clvmd +
> pacemaker + corosync. I wish you could explain it for me if you are free.
> thank you very much!*
> *I have 2 nodes and the pacemaker's status is as follows:*
> 
> [root@controller ~]# pcs status --full
> Cluster name: mycluster
> Last updated: Mon Dec  5 18:15:12 2016Last change: Fri Dec  2
> 15:01:03 2016 by root via cibadmin on compute1
> Stack: corosync
> Current DC: compute1 (2) (version 1.1.13-10.el7_2.4-44eb2dd) - partition
> with quorum
> 2 nodes and 4 resources configured
> 
> Online: [ compute1 (2) controller (1) ]
> 
> Full list of resources:
> 
>  Clone Set: dlm-clone [dlm]
>  dlm(ocf::pacemaker:controld):Started compute1
>  dlm(ocf::pacemaker:controld):Started controller
>  Started: [ compute1 controller ]
>  Clone Set: clvmd-clone [clvmd]
>  clvmd(ocf::heartbeat:clvm):Started compute1
>  clvmd(ocf::heartbeat:clvm):Started controller
>  Started: [ compute1 controller ]
> 
> Node Attributes:
> * Node compute1 (2):
> * Node controller (1):
> 
> Migration Summary:
> * Node compute1 (2):
> * Node controller (1):
> 
> PCSD Status:
>   controller: Online
>   compute1: Online
> 
> Daemon Status:
>   corosync: active/disabled
>   pacemaker: active/disabled
>   pcsd: active/enabled
> 
> 
> 
> *I create a lvm on controller node and it can be seen on the compute1
> node immediately with 'lvs' command. but the lvm it not activate on
> compute1.*
> *then i want to create a snapshot of the lvm, but failed with the error
> message:*
> 
> 
> 
> *### volume-4fad87bb-3d4c-4a96-bef1-8799980050d1 must be active exclusively
> to create snapshot ###*
> *Can someone tell me how to snapshot a lvm in the cluster lvm environment?
> thank you very much。*

Did you try "vgchange -a e ..."?

> 
> 
> Additional information:
> 
> [root@controller ~]# vgdisplay
>   --- Volume group ---
>   VG Name   cinder-volumes
>   System ID
>   Formatlvm2
>   Metadata Areas1
>   Metadata Sequence No  19
>   VG Access read/write
>   VG Status resizable
>   Clustered yes
>   Sharedno
>   MAX LV0
>   Cur LV1
>   Open LV   0
>   Max PV0
>   Cur PV1
>   Act PV1
>   VG Size   1000.00 GiB
>   PE Size   4.00 MiB
>   Total PE  255999
>   Alloc PE / Size   256 / 1.00 GiB
>   Free  PE / Size   255743 / 999.00 GiB
>   VG UUID   aLamHi-mMcI-2NsC-Spjm-QWZr-MzHx-pPYSTt
> 
> [root@controller ~]# rpm -qa |grep pacem
> pacemaker-cli-1.1.13-10.el7_2.4.x86_64
> pacemaker-libs-1.1.13-10.el7_2.4.x86_64
> pacemaker-1.1.13-10.el7_2.4.x86_64
> pacemaker-cluster-libs-1.1.13-10.el7_2.4.x86_64
> 
> 
> [root@controller ~]# lvs
>   LV  VG Attr
> LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
>   volume-1b0ea468-37c8-4b47-a6fa-6cce65b068b5 cinder-volumes -wi-a-
> 1.00g
> 
> 
> [root@compute1 ~]# lvs
>   LV  VG Attr
> LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
>   volume-1b0ea468-37c8-4b47-a6fa-6cce65b068b5 cinder-volumes -wi---
> 1.00g
> 
> 
> thank you very much!




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: How to DRBD + Pacemaker + Samba in Active/Passive Cluster?

2016-12-06 Thread Ulrich Windl
>>> Semion Itic  schrieb am 05.12.2016 um 10:16 in Nachricht
<7bdd2f4d-7c0a-49b1-b42e-9668befbb...@outlook.de>:
> Hello Everybody,
> 
> How to DRBD + Pacemaker + Samba  in Active/Passive Cluster?
> 
> I have been searching now about many days how to integrate drbd + pacemaker

> and corosync in a two node active/passive cluster (with Service IP)  with 
> SAMBA. And I still don’t unterstand how to to go further after mounting
the 
> filesystem, I want to integrate Samba in the pacemaker process as a service.

> I saw that the main solution to this, is using CTDB, but this seems to be 
> very complex for me. So, do anybody have experience in this combination of 
> topics and can provide me with a Instruction or at least an advice.

Hi!

Active/passive suggests you want samba on one node (and thus DRBD in
master/slave configuration). CTDB is not complex, but CTDB is an active/active
configuration (AFAIK) and you'd need DRBD in dual-master configuration as well
as a cluster filesystem (for a lock file only, BTW).
CTDB replicates the configuration to the local nodes.

Ulrich

> 
> Regards,
> Simon I




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org