date:20151007

Re: [ClusterLabs] corosync - CS_ERR_BAD_HANDLE when multiple nodes are starting up

2015-10-07 Thread Thomas Lamprecht


Hi,
again thanks for the response!

Thomas,

Thomas Lamprecht napsal(a):

Hi,

thanks for the response!
I added some information and clarification below.

On 10/01/2015 09:23 AM, Jan Friesse wrote:

Hi,

Thomas Lamprecht napsal(a):

Hello,

we are using corosync version needle (2.3.5) for our cluster 
filesystem

(pmxcfs).
The situation is the following. First we start up the pmxcfs, which is
an fuse fs. And if there is an cluster configuration, we start also
corosync.
This allows the filesystem to exist on one node 'cluster's or 
forcing it

in an local mode. We use CPG to send our messages to all members,
the filesystem is in the RAM and all fs operations are sent 'over the
wire'.

The problem is now the following:
When we're restarting all (in my test case 3) nodes at the same 
time, I

get in 1 from 10 cases only CS_ERR_BAD_HANDLE back when calling


I'm really unsure how to understand what are you doing. You are
restarting all nodes and get CS_ERR_BAD_HANDLE? I mean, if you are
restarting all nodes, which node returns CS_ERR_BAD_HANDLE? Or are you
restarting just pmxcfs? Or just coorsync?

Clarification, sorry was a bit unspecific. I can see the error behaviour
in two cases:
1) I restart three physical hosts (= nodes) at the same time, one of
them - normally the last one coming up again - joins successfully the
corosync cluster the filesystem (pmxcfs) notices that, but then
cpg_mcast_joined receives only CS_ERR_BAD_HANDLE errors.


Ok, that is weird. Are you able to reproduce same behavior restarting 
pmxcfs? Or really membership change (= restart of node) is needed? 
Also are you sure network interface is up when corosync starts?
No, tried quite a few times to restart pmxcfs but that didn't trigger 
the problem, yet. But I could trigger it once with only restarting one 
node, so restarting all only makes the problem worse but isn't needed in 
the first place.




corosync.log of failing node may be interesting.

My nodes hostnames are [one, two, three], this time the came up in the 
order they're named.
This time it was on two nodes the first and second node coming up again. 
corosync log seems normal, although I haven't had debug mode enabled, 
don't know what difference that makes when no errors shows up in the 
normal log.
Oct 07 09:06:36 [1335] two corosync notice [MAIN  ] Corosync Cluster 
Engine ('2.3.5'): started and ready to provide service.
Oct 07 09:06:36 [1335] two corosync info[MAIN  ] Corosync built-in 
features: augeas systemd pie relro bindnow
Oct 07 09:06:36 [1335] two corosync notice  [TOTEM ] Initializing 
transport (UDP/IP Multicast).
Oct 07 09:06:36 [1335] two corosync notice  [TOTEM ] Initializing 
transmit/receive security (NSS) crypto: aes256 hash: sha1
Oct 07 09:06:36 [1335] two corosync notice  [TOTEM ] The network 
interface [10.10.1.152] is now up.
Oct 07 09:06:36 [1335] two corosync notice  [SERV  ] Service engine 
loaded: corosync configuration map access [0]

Oct 07 09:06:36 [1335] two corosync info[QB] server name: cmap
Oct 07 09:06:36 [1335] two corosync notice  [SERV  ] Service engine 
loaded: corosync configuration service [1]

Oct 07 09:06:36 [1335] two corosync info[QB] server name: cfg
Oct 07 09:06:36 [1335] two corosync notice  [SERV  ] Service engine 
loaded: corosync cluster closed process group service v1.01 [2]

Oct 07 09:06:36 [1335] two corosync info[QB] server name: cpg
Oct 07 09:06:36 [1335] two corosync notice  [SERV  ] Service engine 
loaded: corosync profile loading service [4]
Oct 07 09:06:36 [1335] two corosync notice  [QUORUM] Using quorum 
provider corosync_votequorum
Oct 07 09:06:36 [1335] two corosync notice  [SERV  ] Service engine 
loaded: corosync vote quorum service v1.0 [5]
Oct 07 09:06:36 [1335] two corosync info[QB] server name: 
votequorum
Oct 07 09:06:36 [1335] two corosync notice  [SERV  ] Service engine 
loaded: corosync cluster quorum service v0.1 [3]

Oct 07 09:06:36 [1335] two corosync info[QB] server name: quorum
Oct 07 09:06:36 [1335] two corosync notice  [TOTEM ] A new membership 
(10.10.1.152:92) was formed. Members joined: 2

Oct 07 09:06:36 [1335] two corosync notice  [QUORUM] Members[1]: 2
Oct 07 09:06:36 [1335] two corosync notice  [MAIN  ] Completed service 
synchronization, ready to provide service.

Then pmxcfs results in:

Oct 07 09:06:38 two pmxcfs[952]: [status] crit: cpg_send_message failed: 9
Oct 07 09:06:38 two pmxcfs[952]: [status] notice: Bad handle 0
Oct 07 09:06:38 two pmxcfs[952]: [status] crit: cpg_send_message failed: 9
Oct 07 09:06:38 two pmxcfs[952]: [status] crit: cpg_send_message failed: 9
Oct 07 09:06:38 two pmxcfs[952]: [status] notice: Bad handle 0
Oct 07 09:06:38 two pmxcfs[952]: [status] crit: cpg_send_message failed: 9


After that the third node joins, but the CS_ERR_BAD_HANDLE stays.

Oct 07 09:06:41 [1335] two corosync notice [TOTEM ] A new membership 
(10.10.1.151:100) was formed. Members joined: 1
Oct 07 09:06:41 [1335] two corosync notice  [QUORUM]

Re: [ClusterLabs] STOP cluster after update resource

2015-10-07 Thread Nikolay Popov


Hello.

We was looking the ways to utilize Corosync/Pacemaker stack for creating a
high-availability cluster of PostgreSQL servers with automatic failover.

We are using Corosync (2.3.4) as a messaging layer and a stateful 
master/slave

Resource Agent (pgsql) with Pacemaker (1.1.12) on CentOS 7.1.

Things work pretty well for a static cluster - where membership is 
defined up front.
However, we needed to be able to seamlessly add new machines (node) to 
the cluster and remove
existing ones from it, without service interruption. And we ran into a 
problem.


Is it possible to add a new node dynamically without interruption?

Do you know the way to add new node to cluster without this disruption?
Maybe some command or something else?

05.10.2015 13:19, Nikolay Popov пишет:

Hello.

I have got STOP cluster status when add\del new cluster node  
after run  command:


How to add a node without STOP cluster?

I am doing command step's:

# pcs cluster auth pi01 pi02 pi03 pi05 -u hacluster -p hacluster

pi01: Authorized
pi02: Authorized
pi03: Authorized
pi05: Authorized

# pcs cluster node add pi05 --start

pi01: Corosync updated
pi02: Corosync updated
pi03: Corosync updated
pi05: Succeeded
pi05: Starting Cluster...

# pcs resource show --full

 Group: master-group
  Resource: vip-master (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=192.168.242.100 nic=eth0 cidr_netmask=24
   Operations: start interval=0s timeout=60s on-fail=restart 
(vip-master-start-interval-0s)
   monitor interval=10s timeout=60s on-fail=restart 
(vip-master-monitor-interval-10s)
   stop interval=0s timeout=60s on-fail=block 
(vip-master-stop-interval-0s)

  Resource: vip-rep (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=192.168.242.101 nic=eth0 cidr_netmask=24
   Meta Attrs: migration-threshold=0
   Operations: start interval=0s timeout=60s on-fail=stop 
(vip-rep-start-interval-0s)
   monitor interval=10s timeout=60s on-fail=restart 
(vip-rep-monitor-interval-10s)
   stop interval=0s timeout=60s on-fail=ignore 
(vip-rep-stop-interval-0s)

 Master: msPostgresql
  Meta Attrs: master-max=1 master-node-max=1 clone-max=3 
clone-node-max=1 notify=true

  Resource: pgsql (class=ocf provider=heartbeat type=pgsql)
   Attributes: pgctl=/usr/pgsql-9.5/bin/pg_ctl 
psql=/usr/pgsql-9.5/bin/psql pgdata=/var/lib/pgsql/9.5/data/ 
rep_mode=sync node_list="pi01 pi02 pi03" restore_command="cp 
/var/lib/pgsql/9.5/data/wal_archive/%f %p" 
primary_conninfo_opt="user=repl password=super-pass-for-repl 
keepalives_idle=60 keepalives_interval=5 keepalives_count=5" 
master_ip=192.168.242.100 restart_on_promote=true check_wal_receiver=true
   Operations: start interval=0s timeout=60s on-fail=restart 
(pgsql-start-interval-0s)
   monitor interval=4s timeout=60s on-fail=restart 
(pgsql-monitor-interval-4s)
   monitor role=Master timeout=60s on-fail=restart 
interval=3s (pgsql-monitor-interval-3s-role-Master)
   promote interval=0s timeout=60s on-fail=restart 
(pgsql-promote-interval-0s)
   demote interval=0s timeout=60s on-fail=stop 
(pgsql-demote-interval-0s)
   stop interval=0s timeout=60s on-fail=block 
(pgsql-stop-interval-0s)

   notify interval=0s timeout=60s (pgsql-notify-interval-0s)


# pcs resource update msPostgresql pgsql master-max=1 
master-node-max=1 clone-max=4 clone-node-max=1 notify=true


# pcs resource update pgsql pgsql node_list="pi01 pi02 pi03 pi05"

# crm_mon -Afr1

Last updated: Fri Oct  2 17:07:05 2015  Last change: Fri Oct  
2 17:06:37 2015

 by root via cibadmin on pi01
Stack: corosync
Current DC: pi02 (version 1.1.13-a14efad) - partition with quorum
4 nodes and 9 resources configured

Online: [ pi01 pi02 pi03 pi05 ]

Full list of resources:

 Resource Group: master-group
 vip-master (ocf::heartbeat:IPaddr2):   Stopped
 vip-rep(ocf::heartbeat:IPaddr2):   Stopped
 Master/Slave Set: msPostgresql [pgsql]
 Slaves: [ pi02 ]
 Stopped: [ pi01 pi03 pi05 ]
 fence-pi01 (stonith:fence_ssh):Started pi02
 fence-pi02 (stonith:fence_ssh):Started pi01
 fence-pi03 (stonith:fence_ssh):Started pi01

Node Attributes:
* Node pi01:
+ master-pgsql  : -INFINITY
+ pgsql-data-status : STREAMING|SYNC
+ pgsql-status  : STOP
* Node pi02:
+ master-pgsql  : -INFINITY
+ pgsql-data-status : LATEST
+ pgsql-status  : STOP
* Node pi03:
+ master-pgsql  : -INFINITY
+ pgsql-data-status : STREAMING|POTENTIAL
+ pgsql-status  : STOP
* Node pi05:
+ master-pgsql  : -INFINITY
+ pgsql-status  : STOP

Migration Summary:
* Node pi01:
* Node pi03:
* Node pi02:
* Node pi05:

After some time is worked:

Every 2.0s: crm_mon 
-Afr1

[ClusterLabs] parallel execution of resources

2015-10-07 Thread Vijay Partha

hi,

I have around 30 services running in each node of a 2 node cluster.
when more than one resource fails pacemaker tries to restart the resources
but does it in a sequential way. is it possible for pacemaker to start
resources parallely?

are there situations where pacemaker can reboot a node automatically?  I
was working on a node and suddenly got rebooted. I am having a thought that
pacemaker could be the reason. is the node rebooted  because of pacemaker?

thanks and regards
p.vijay
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] group resources not grouped ?!?

2015-10-07 Thread zulucloud


Hi,
i got a problem i don't understand, maybe someone can give me a hint.

My 2-node cluster (named ali and baba) is configured to run mysql, an IP 
for mysql and the filesystem resource (on drbd master) together as a 
GROUP. After doing some crash-tests i ended up having filesystem and 
mysql running happily on one host (ali), and the related IP on the other 
(baba)  although, the IP's not really up and running, crm_mon just 
SHOWS it as started there. In fact it's nowhere up, neither on ali nor 
on baba.


crm_mon shows that pacemaker tried to start it on baba, but gave up 
after fail-count=100.


Q1: why doesn't pacemaker put the IP on ali, where all the rest of it's 
group lives?
Q2: why doesn't pacemaker try to start the IP on ali, after max 
failcount had been reached on baba?
Q3: why is crm_mon showing the IP as "started", when it's down after 
10 tries?


Thanks :)


config (some parts removed):
---
node ali
node baba

primitive res_drbd ocf:linbit:drbd \
params drbd_resource="r0" \
op stop interval="0" timeout="100" \
op start interval="0" timeout="240" \
op promote interval="0" timeout="90" \
op demote interval="0" timeout="90" \
op notify interval="0" timeout="90" \
op monitor interval="40" role="Slave" timeout="20" \
op monitor interval="20" role="Master" timeout="20"
primitive res_fs ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/drbd_mnt" fstype="ext4" \
op monitor interval="30s"
primitive res_hamysql_ip ocf:heartbeat:IPaddr2 \
params ip="XXX.XXX.XXX.224" nic="eth0" cidr_netmask="23" \
op monitor interval="10s" timeout="20s" depth="0"
primitive res_mysql lsb:mysql \
op start interval="0" timeout="15" \
op stop interval="0" timeout="15" \
op monitor start-delay="30" interval="15" time-out="15"

group gr_mysqlgroup res_fs res_mysql res_hamysql_ip \
meta target-role="Started"
ms ms_drbd res_drbd \
	meta master-max="1" master-node-max="1" clone-max="2" 
clone-node-max="1" notify="true"


colocation col_fs_on_drbd_master inf: res_fs:Started ms_drbd:Master

order ord_drbd_master_then_fs inf: ms_drbd:promote res_fs:start

property $id="cib-bootstrap-options" \
dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
cluster-infrastructure="openais" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
expected-quorum-votes="2" \
last-lrm-refresh="1438857246"


crm_mon -rnf (some parts removed):
-
Node ali: online
res_fs  (ocf::heartbeat:Filesystem) Started
res_mysql   (lsb:mysql) Started
res_drbd:0  (ocf::linbit:drbd) Master
Node baba: online
res_hamysql_ip  (ocf::heartbeat:IPaddr2) Started
res_drbd:1  (ocf::linbit:drbd) Slave

Inactive resources:

Migration summary:

* Node baba:
   res_hamysql_ip: migration-threshold=100 fail-count=100

Failed actions:
res_hamysql_ip_stop_0 (node=a891vl107s, call=35, rc=1, 
status=complete): unknown error


corosync.log:
--
pengine: [1223]: WARN: should_dump_input: Ignoring requirement that 
res_hamysql_ip_stop_0 comeplete before gr_mysqlgroup_stopped_0: 
unmanaged failed resources cannot prevent shutdown


pengine: [1223]: WARN: should_dump_input: Ignoring requirement that 
res_hamysql_ip_stop_0 comeplete before gr_mysqlgroup_stopped_0: 
unmanaged failed resources cannot prevent shutdown


Software:
--
corosync 1.2.1-4
pacemaker 1.0.9.1+hg15626-1
drbd8-utils 2:8.3.7-2.1
(for some reason it's not possible to update at this time)



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Antw: parallel execution of resources

2015-10-07 Thread Ulrich Windl

>>> Vijay Partha  schrieb am 07.10.2015 um 11:39 in
Nachricht
:
> hi,
> 
> I have around 30 services running in each node of a 2 node cluster.
> when more than one resource fails pacemaker tries to restart the resources
> but does it in a sequential way. is it possible for pacemaker to start
> resources parallely?

Normally it does (while following ordering constraints).

> 
> are there situations where pacemaker can reboot a node automatically?  I
> was working on a node and suddenly got rebooted. I am having a thought that
> pacemaker could be the reason. is the node rebooted  because of pacemaker?

It's called "fencing". Maybe have a look at yur syslog or cluster logs

Regards,
Ulrich

> 
> thanks and regards
> p.vijay





___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Antw: group resources not grouped ?!?

2015-10-07 Thread Ulrich Windl

>>> zulucloud  schrieb am 07.10.2015 um 16:12 in 
>>> Nachricht
<5615284e.8050...@mailbox.org>:
> Hi,
> i got a problem i don't understand, maybe someone can give me a hint.
> 
> My 2-node cluster (named ali and baba) is configured to run mysql, an IP 
> for mysql and the filesystem resource (on drbd master) together as a 
> GROUP. After doing some crash-tests i ended up having filesystem and 
> mysql running happily on one host (ali), and the related IP on the other 
> (baba)  although, the IP's not really up and running, crm_mon just 
> SHOWS it as started there. In fact it's nowhere up, neither on ali nor 
> on baba.

Then it's most likely a bug in the resource agent. To make sure, try "crm 
resource reprobe" and be patient after that for some seconds. Then recheck the 
displayed status.

> 
> crm_mon shows that pacemaker tried to start it on baba, but gave up 
> after fail-count=100.

This could mean: Multiple start attempty failed, as did stop attempts, so the 
cluster thinks it might be running. It looks very much like a configuration 
problem to me.

> 
> Q1: why doesn't pacemaker put the IP on ali, where all the rest of it's 
> group lives?

See the log files in detail.

> Q2: why doesn't pacemaker try to start the IP on ali, after max 
> failcount had been reached on baba?

Do you have fencing enabled?

> Q3: why is crm_mon showing the IP as "started", when it's down after 
> 10 tries?

See above.

> 
> Thanks :)

8-)

> 
> 
> config (some parts removed):
> ---
> node ali
> node baba
> 
> primitive res_drbd ocf:linbit:drbd \
>   params drbd_resource="r0" \
>   op stop interval="0" timeout="100" \
>   op start interval="0" timeout="240" \
>   op promote interval="0" timeout="90" \
>   op demote interval="0" timeout="90" \
>   op notify interval="0" timeout="90" \
>   op monitor interval="40" role="Slave" timeout="20" \
>   op monitor interval="20" role="Master" timeout="20"
> primitive res_fs ocf:heartbeat:Filesystem \
>   params device="/dev/drbd0" directory="/drbd_mnt" fstype="ext4" \
>   op monitor interval="30s"
> primitive res_hamysql_ip ocf:heartbeat:IPaddr2 \
>   params ip="XXX.XXX.XXX.224" nic="eth0" cidr_netmask="23" \
>   op monitor interval="10s" timeout="20s" depth="0"
> primitive res_mysql lsb:mysql \
>   op start interval="0" timeout="15" \
>   op stop interval="0" timeout="15" \
>   op monitor start-delay="30" interval="15" time-out="15"
> 
> group gr_mysqlgroup res_fs res_mysql res_hamysql_ip \
>   meta target-role="Started"
> ms ms_drbd res_drbd \
>   meta master-max="1" master-node-max="1" clone-max="2" 
> clone-node-max="1" notify="true"
> 
> colocation col_fs_on_drbd_master inf: res_fs:Started ms_drbd:Master
> 
> order ord_drbd_master_then_fs inf: ms_drbd:promote res_fs:start
> 
> property $id="cib-bootstrap-options" \
>   dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
>   cluster-infrastructure="openais" \
>   stonith-enabled="false" \
>   no-quorum-policy="ignore" \
>   expected-quorum-votes="2" \
>   last-lrm-refresh="1438857246"
> 
> 
> crm_mon -rnf (some parts removed):
> -
> Node ali: online
>  res_fs  (ocf::heartbeat:Filesystem) Started
>  res_mysql   (lsb:mysql) Started
>  res_drbd:0  (ocf::linbit:drbd) Master
> Node baba: online
>  res_hamysql_ip  (ocf::heartbeat:IPaddr2) Started
>  res_drbd:1  (ocf::linbit:drbd) Slave
> 
> Inactive resources:
> 
> Migration summary:
> 
> * Node baba:
> res_hamysql_ip: migration-threshold=100 fail-count=100
> 
> Failed actions:
>  res_hamysql_ip_stop_0 (node=a891vl107s, call=35, rc=1, 
> status=complete): unknown error
> 
> corosync.log:
> --
> pengine: [1223]: WARN: should_dump_input: Ignoring requirement that 
> res_hamysql_ip_stop_0 comeplete before gr_mysqlgroup_stopped_0: 
> unmanaged failed resources cannot prevent shutdown
> 
> pengine: [1223]: WARN: should_dump_input: Ignoring requirement that 
> res_hamysql_ip_stop_0 comeplete before gr_mysqlgroup_stopped_0: 
> unmanaged failed resources cannot prevent shutdown
> 
> Software:
> --
> corosync 1.2.1-4
> pacemaker 1.0.9.1+hg15626-1
> drbd8-utils 2:8.3.7-2.1
> (for some reason it's not possible to update at this time)
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: group resources not grouped ?!?

2015-10-07 Thread zulucloud




On 10/07/2015 04:46 PM, Ulrich Windl wrote:

zulucloud  schrieb am 07.10.2015 um 16:12 in Nachricht

<5615284e.8050...@mailbox.org>:

Hi,
i got a problem i don't understand, maybe someone can give me a hint.

My 2-node cluster (named ali and baba) is configured to run mysql, an IP
for mysql and the filesystem resource (on drbd master) together as a
GROUP. After doing some crash-tests i ended up having filesystem and
mysql running happily on one host (ali), and the related IP on the other
(baba)  although, the IP's not really up and running, crm_mon just
SHOWS it as started there. In fact it's nowhere up, neither on ali nor
on baba.


Then it's most likely a bug in the resource agent. To make sure, try "crm resource 
reprobe" and be patient after that for some seconds. Then recheck the displayed 
status.


In the meantime i already did a "resource cleanup res_hamysql_ip". The 
failcounts etc. disappeared. After that a "start gr_mysqlgroup" started 
everything without a hassle on the correct node.




crm_mon shows that pacemaker tried to start it on baba, but gave up
after fail-count=100.


This could mean: Multiple start attempty failed, as did stop attempts, so the 
cluster thinks it might be running. It looks very much like a configuration 
problem to me.



Q1: why doesn't pacemaker put the IP on ali, where all the rest of it's
group lives?


See the log files in detail.


Well, they're quite verbose and a little bit cryptic...;) I didn't find 
anything what could enlighten that for me...





Q2: why doesn't pacemaker try to start the IP on ali, after max
failcount had been reached on baba?


Do you have fencing enabled?


No. These are 2 virtual machines running together with some other vm's 
on 2 physical VMWare servers. Could you give me a suggestion on how to 
implement fencing in that situation?


thx


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: crm_report consumes all available RAM

2015-10-07 Thread Lars Ellenberg

On Tue, Oct 06, 2015 at 11:50:00PM +0200, Jan Pokorný wrote:
> On 06/10/15 10:28 +0200, Dejan Muhamedagic wrote:
> > On Mon, Oct 05, 2015 at 07:00:18PM +0300, Vladislav Bogdanov wrote:
> >> 14.09.2015 02:31, Andrew Beekhof wrote:
> >>> 
>  On 8 Sep 2015, at 10:18 pm, Ulrich Windl 
>   wrote:
>  
> >>> Vladislav Bogdanov  schrieb am 08.09.2015 um 
> >>> 14:05 in
>  Nachricht <55eecefb.8050...@hoster-ok.com>:
> > Hi,
> > 
> > just discovered very interesting issue.
> > If there is a system user with very big UID (8002 in my case),
> > then crm_report (actually 'grep' it runs) consumes too much RAM.
> > 
> > Relevant part of the process tree at that moment looks like (word-wrap 
> > off):
> > USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> > ...
> > root 25526  0.0  0.0 106364   636 ?S12:37   0:00
> >   \_
> > /bin/sh /usr/sbin/crm_report --dest=/var/log/crm_report -f -01-01 
> > 00:00:00
> > root 25585  0.0  0.0 106364   636 ?S12:37   0:00
> > \_ bash /var/log/crm_report/collector
> > root 25613  0.0  0.0 106364   152 ?S12:37   0:00
> > \_ bash /var/log/crm_report/collector
> > root 25614  0.0  0.0 106364   692 ?S12:37   0:00
> > \_ bash /var/log/crm_report/collector
> > root 27965  4.9  0.0 100936   452 ?S12:38   0:01
> > |   \_ cat /var/log/lastlog
> > root 27966 23.0 82.9 3248996 1594688 ? D12:38   0:08
> > |   \_ grep -l -e Starting Pacemaker


Whoa.
grep using up 1.5 gig resident (3.2 gig virtual) still looking for
the first newline.

I suggest in addition to the (good) suggestions so far,
to also set a ulimit.

1) export LC_ALL=C
so grep won't take quadratic time trying to make sure it understands
unicode correctly; yes, I'm sure that bug has been fix on most systems 
meanwhile...

2)
 ( ulimit -v 10 ; grep ) 

Usually, even with "very many very long lines",
my grep stays below a few (~3) megabyte.
A limit of 100M seems to be way too much,
but if it thinks it needs that much RAM to find a short string,
then we are very likely not interested in that file.


-- 
: Lars Ellenberg
: http://www.LINBIT.com | Your Way to High Availability
: DRBD, Linux-HA  and  Pacemaker support and consulting

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: crm_report consumes all available RAM

2015-10-07 Thread Lars Ellenberg

On Tue, Oct 06, 2015 at 11:50:00PM +0200, Jan Pokorný wrote:
> On 06/10/15 10:28 +0200, Dejan Muhamedagic wrote:
> > On Mon, Oct 05, 2015 at 07:00:18PM +0300, Vladislav Bogdanov wrote:
> >> 14.09.2015 02:31, Andrew Beekhof wrote:
> >>> 
>  On 8 Sep 2015, at 10:18 pm, Ulrich Windl 
>   wrote:
>  
> >>> Vladislav Bogdanov  schrieb am 08.09.2015 um 
> >>> 14:05 in
>  Nachricht <55eecefb.8050...@hoster-ok.com>:
> > Hi,
> > 
> > just discovered very interesting issue.
> > If there is a system user with very big UID (8002 in my case),
> > then crm_report (actually 'grep' it runs) consumes too much RAM.
> > 
> > Relevant part of the process tree at that moment looks like (word-wrap 
> > off):
> > USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> > ...
> > root 25526  0.0  0.0 106364   636 ?S12:37   0:00
> >   \_
> > /bin/sh /usr/sbin/crm_report --dest=/var/log/crm_report -f -01-01 
> > 00:00:00
> > root 25585  0.0  0.0 106364   636 ?S12:37   0:00
> > \_ bash /var/log/crm_report/collector
> > root 25613  0.0  0.0 106364   152 ?S12:37   0:00
> > \_ bash /var/log/crm_report/collector
> > root 25614  0.0  0.0 106364   692 ?S12:37   0:00
> > \_ bash /var/log/crm_report/collector
> > root 27965  4.9  0.0 100936   452 ?S12:38   0:01
> > |   \_ cat /var/log/lastlog
> > root 27966 23.0 82.9 3248996 1594688 ? D12:38   0:08
> > |   \_ grep -l -e Starting Pacemaker
> > root 25615  0.0  0.0 155432   600 ?S12:37   0:00
> > \_ sort -u
> > 
> > ls -ls /var/log/lastlog shows:
> > 40 -rw-r--r--. 1 root root 2336876 Sep  8 04:36 /var/log/lastlog
> > 
> > That is sparse binary file, which consumes only 40k of disk space.
> > At the same time its size is 23GB, and grep takes all the RAM trying to
> > grep a string from a 23GB of mostly zeroes without new-lines.
> > 
> > I believe this is worth fixing,
> >>> 
> >>> Shouldn’t this be directed to the grep folks?
> >> 
> >> Actually, not everything in /var/log are textual logs. Currently
> >> findmsg() [z,bz,xz]cats _every_ file there and greps for a pattern.
> >> Shouldn't it skip some well-known ones? btmp, lastlog and wtmp are
> >> good candidates to be skipped. They are not intended to be handled
> >> as a text.
> >> 
> >> Or may be just test that file is a text in a find_decompressor() and
> >> to not cat it if it is not?
> >> 
> >> something like
> >> find_decompressor() {
> >> if echo $1 | grep -qs 'bz2$'; then
> >> echo "bzip2 -dc"
> >> elif echo $1 | grep -qs 'gz$'; then
> >> echo "gzip -dc"
> >> elif echo $1 | grep -qs 'xz$'; then
> >> echo "xz -dc"
> >> elif file $1 | grep -qs 'text'; then
> >> echo "cat"
> >> else
> >> echo "echo"
> > 
> > Good idea.
> 
> Even better might be using process substitution and avoid cat'ing if
> not needed even for plain text files, assuming GNU grep 2.13+ that,
> in combination with kernel, attempts to detect sparse files, marking
> them as binary files[1], which can then be utilized in combination
> with -I option.

Something like the below, maybe.
Untested direct-to-email PoC code.

if echo . | grep -q -I . 2>/dev/null; then
have_grep_dash_I=true
else
have_grep_dash_I=false
fi
# similar checks can be made for other decompressors

mygrep()
{
(
# sub shell for ulimit

# ulimit -v ... but maybe someone wants to mmap a huge file,
# and limiting the virtual size cripples mmap unnecessarily,
# so let's limit resident size instead.  Let's be generous, when
# decompressing stuff that was compressed with xz -9, we may
# need ~65 MB according to my man page, and if it was generated
# by something else, the decompressor may need even more.
# Grep itself should not use much more than single digit MB,
# so if the pipeline below needs more than 200 MB resident,
# we probably are not interested in that file in any case.
#
ulimit -m 20

# Actually no need for "local" anymore,
# this is a subshell already. Just a habbit.

local file=$1
case $file in
*.bz2) bzgrep "$file";; # or bzip2 -dc  | grep, if you prefer
*.gz)  zgrep "$file";;
*.xz)  xzgrep "$file";;
# ...
*)
local file_type=$(file "$file")
case $file_type in
*text*)
grep "$file" ;;
*)
# try anyways, let grep use its own heuristic
$have_grep_dash_I && grep --binary-files=without-match 
"$file" ;;
esac ;;
esac
)
}


-- 
: Lars Ellenberg
: http

Re: [ClusterLabs] Antw: crm_report consumes all available RAM

2015-10-07 Thread Lars Ellenberg

On Wed, Oct 07, 2015 at 05:39:01PM +0200, Lars Ellenberg wrote:
> Something like the below, maybe.
> Untested direct-to-email PoC code.
> 
> if echo . | grep -q -I . 2>/dev/null; then
>   have_grep_dash_I=true
> else
>   have_grep_dash_I=false
> fi
> # similar checks can be made for other decompressors
> 
> mygrep()
> {
>   (
>   # sub shell for ulimit
> 
>   # ulimit -v ... but maybe someone wants to mmap a huge file,
>   # and limiting the virtual size cripples mmap unnecessarily,
>   # so let's limit resident size instead.  Let's be generous, when
>   # decompressing stuff that was compressed with xz -9, we may
>   # need ~65 MB according to my man page, and if it was generated
>   # by something else, the decompressor may need even more.
>   # Grep itself should not use much more than single digit MB,
>   # so if the pipeline below needs more than 200 MB resident,
>   # we probably are not interested in that file in any case.
>   #
>   ulimit -m 20

Bah. scratch that.
RLIMIT_RSS No longer has any effect on linux 2.6.
so we are back to
ulimit -v 20
> 
>   # Actually no need for "local" anymore,
>   # this is a subshell already. Just a habbit.
> 
>   local file=$1
>   case $file in
>   *.bz2) bzgrep "$file";; # or bzip2 -dc  | grep, if you prefer
>   *.gz)  zgrep "$file";;
>   *.xz)  xzgrep "$file";;
>   # ...
>   *)
>   local file_type=$(file "$file")
>   case $file_type in
>   *text*)
>   grep "$file" ;;
>   *)
>   # try anyways, let grep use its own heuristic
>   $have_grep_dash_I && grep --binary-files=without-match 
> "$file" ;;
>   esac ;;
>   esac
>   )
> }

-- 
: Lars Ellenberg
: http://www.LINBIT.com | Your Way to High Availability
: DRBD, Linux-HA  and  Pacemaker support and consulting

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] group resources not grouped ?!?

2015-10-07 Thread Ken Gaillot

On 10/07/2015 09:12 AM, zulucloud wrote:
> Hi,
> i got a problem i don't understand, maybe someone can give me a hint.
> 
> My 2-node cluster (named ali and baba) is configured to run mysql, an IP
> for mysql and the filesystem resource (on drbd master) together as a
> GROUP. After doing some crash-tests i ended up having filesystem and
> mysql running happily on one host (ali), and the related IP on the other
> (baba)  although, the IP's not really up and running, crm_mon just
> SHOWS it as started there. In fact it's nowhere up, neither on ali nor
> on baba.
> 
> crm_mon shows that pacemaker tried to start it on baba, but gave up
> after fail-count=100.
> 
> Q1: why doesn't pacemaker put the IP on ali, where all the rest of it's
> group lives?
> Q2: why doesn't pacemaker try to start the IP on ali, after max
> failcount had been reached on baba?
> Q3: why is crm_mon showing the IP as "started", when it's down after
> 10 tries?
> 
> Thanks :)
> 
> 
> config (some parts removed):
> ---
> node ali
> node baba
> 
> primitive res_drbd ocf:linbit:drbd \
> params drbd_resource="r0" \
> op stop interval="0" timeout="100" \
> op start interval="0" timeout="240" \
> op promote interval="0" timeout="90" \
> op demote interval="0" timeout="90" \
> op notify interval="0" timeout="90" \
> op monitor interval="40" role="Slave" timeout="20" \
> op monitor interval="20" role="Master" timeout="20"
> primitive res_fs ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" directory="/drbd_mnt" fstype="ext4" \
> op monitor interval="30s"
> primitive res_hamysql_ip ocf:heartbeat:IPaddr2 \
> params ip="XXX.XXX.XXX.224" nic="eth0" cidr_netmask="23" \
> op monitor interval="10s" timeout="20s" depth="0"
> primitive res_mysql lsb:mysql \
> op start interval="0" timeout="15" \
> op stop interval="0" timeout="15" \
> op monitor start-delay="30" interval="15" time-out="15"
> 
> group gr_mysqlgroup res_fs res_mysql res_hamysql_ip \
> meta target-role="Started"
> ms ms_drbd res_drbd \
> meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> 
> colocation col_fs_on_drbd_master inf: res_fs:Started ms_drbd:Master
> 
> order ord_drbd_master_then_fs inf: ms_drbd:promote res_fs:start
> 
> property $id="cib-bootstrap-options" \
> dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
> cluster-infrastructure="openais" \
> stonith-enabled="false" \

Not having stonith is part of the problem (see below).

Without stonith, if the two nodes go into split brain (both up but can't
communicate with each other), Pacemaker will try to promote DRBD to
master on both nodes, mount the filesystem on both nodes, and start
MySQL on both nodes.

> no-quorum-policy="ignore" \
> expected-quorum-votes="2" \
> last-lrm-refresh="1438857246"
> 
> 
> crm_mon -rnf (some parts removed):
> -
> Node ali: online
> res_fs  (ocf::heartbeat:Filesystem) Started
> res_mysql   (lsb:mysql) Started
> res_drbd:0  (ocf::linbit:drbd) Master
> Node baba: online
> res_hamysql_ip  (ocf::heartbeat:IPaddr2) Started
> res_drbd:1  (ocf::linbit:drbd) Slave
> 
> Inactive resources:
> 
> Migration summary:
> 
> * Node baba:
>res_hamysql_ip: migration-threshold=100 fail-count=100
> 
> Failed actions:
> res_hamysql_ip_stop_0 (node=a891vl107s, call=35, rc=1,
> status=complete): unknown error

The "_stop_" above means that a *stop* action on the IP failed.
Pacemaker tried to migrate the IP by first stopping it on baba, but it
couldn't. (Since the IP is the last member of the group, its failure
didn't prevent the other members from moving.)

Normally, when a stop fails, Pacemaker fences the node so it can safely
bring up the resource on the other node. But you disabled stonith, so it
got into this state.

So, to proceed:

1) Stonith would help :)

2) Figure out why it couldn't stop the IP. There might be a clue in the
logs on baba (though they are indeed hard to follow; search for
"res_hamysql_stop_0" around this time, and look around there). You could
also try adding and removing the IP manually, first with the usual OS
commands, and if that works, by calling the IP resource agent directly.
That often turns up the problem.

> 
> corosync.log:
> --
> pengine: [1223]: WARN: should_dump_input: Ignoring requirement that
> res_hamysql_ip_stop_0 comeplete before gr_mysqlgroup_stopped_0:
> unmanaged failed resources cannot prevent shutdown
> 
> pengine: [1223]: WARN: should_dump_input: Ignoring requirement that
> res_hamysql_ip_stop_0 comeplete before gr_mysqlgroup_stopped_0:
> unmanaged failed resources cannot prevent shutdown
> 
> Software:
> --
> corosync 1.2.1-4
> pacemaker 1.0.9.1+hg15626-1
> drbd8-utils 2:8.3.7-2.1
> (for some reason it's not possible to update at this time)

It should be possible to get

Re: [ClusterLabs] corosync - CS_ERR_BAD_HANDLE when multiple nodes are starting up

Re: [ClusterLabs] STOP cluster after update resource

[ClusterLabs] parallel execution of resources

[ClusterLabs] group resources not grouped ?!?

[ClusterLabs] Antw: parallel execution of resources

[ClusterLabs] Antw: group resources not grouped ?!?

Re: [ClusterLabs] Antw: group resources not grouped ?!?

Re: [ClusterLabs] Antw: crm_report consumes all available RAM

Re: [ClusterLabs] Antw: crm_report consumes all available RAM

Re: [ClusterLabs] Antw: crm_report consumes all available RAM

Re: [ClusterLabs] group resources not grouped ?!?

11 matches

Site Navigation

Mail list logo

Footer information