Re: [ClusterLabs] corosync - CS_ERR_BAD_HANDLE when multiple nodes are starting up
Hi, again thanks for the response! Thomas, Thomas Lamprecht napsal(a): Hi, thanks for the response! I added some information and clarification below. On 10/01/2015 09:23 AM, Jan Friesse wrote: Hi, Thomas Lamprecht napsal(a): Hello, we are using corosync version needle (2.3.5) for our cluster filesystem (pmxcfs). The situation is the following. First we start up the pmxcfs, which is an fuse fs. And if there is an cluster configuration, we start also corosync. This allows the filesystem to exist on one node 'cluster's or forcing it in an local mode. We use CPG to send our messages to all members, the filesystem is in the RAM and all fs operations are sent 'over the wire'. The problem is now the following: When we're restarting all (in my test case 3) nodes at the same time, I get in 1 from 10 cases only CS_ERR_BAD_HANDLE back when calling I'm really unsure how to understand what are you doing. You are restarting all nodes and get CS_ERR_BAD_HANDLE? I mean, if you are restarting all nodes, which node returns CS_ERR_BAD_HANDLE? Or are you restarting just pmxcfs? Or just coorsync? Clarification, sorry was a bit unspecific. I can see the error behaviour in two cases: 1) I restart three physical hosts (= nodes) at the same time, one of them - normally the last one coming up again - joins successfully the corosync cluster the filesystem (pmxcfs) notices that, but then cpg_mcast_joined receives only CS_ERR_BAD_HANDLE errors. Ok, that is weird. Are you able to reproduce same behavior restarting pmxcfs? Or really membership change (= restart of node) is needed? Also are you sure network interface is up when corosync starts? No, tried quite a few times to restart pmxcfs but that didn't trigger the problem, yet. But I could trigger it once with only restarting one node, so restarting all only makes the problem worse but isn't needed in the first place. corosync.log of failing node may be interesting. My nodes hostnames are [one, two, three], this time the came up in the order they're named. This time it was on two nodes the first and second node coming up again. corosync log seems normal, although I haven't had debug mode enabled, don't know what difference that makes when no errors shows up in the normal log. Oct 07 09:06:36 [1335] two corosync notice [MAIN ] Corosync Cluster Engine ('2.3.5'): started and ready to provide service. Oct 07 09:06:36 [1335] two corosync info[MAIN ] Corosync built-in features: augeas systemd pie relro bindnow Oct 07 09:06:36 [1335] two corosync notice [TOTEM ] Initializing transport (UDP/IP Multicast). Oct 07 09:06:36 [1335] two corosync notice [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1 Oct 07 09:06:36 [1335] two corosync notice [TOTEM ] The network interface [10.10.1.152] is now up. Oct 07 09:06:36 [1335] two corosync notice [SERV ] Service engine loaded: corosync configuration map access [0] Oct 07 09:06:36 [1335] two corosync info[QB] server name: cmap Oct 07 09:06:36 [1335] two corosync notice [SERV ] Service engine loaded: corosync configuration service [1] Oct 07 09:06:36 [1335] two corosync info[QB] server name: cfg Oct 07 09:06:36 [1335] two corosync notice [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2] Oct 07 09:06:36 [1335] two corosync info[QB] server name: cpg Oct 07 09:06:36 [1335] two corosync notice [SERV ] Service engine loaded: corosync profile loading service [4] Oct 07 09:06:36 [1335] two corosync notice [QUORUM] Using quorum provider corosync_votequorum Oct 07 09:06:36 [1335] two corosync notice [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5] Oct 07 09:06:36 [1335] two corosync info[QB] server name: votequorum Oct 07 09:06:36 [1335] two corosync notice [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3] Oct 07 09:06:36 [1335] two corosync info[QB] server name: quorum Oct 07 09:06:36 [1335] two corosync notice [TOTEM ] A new membership (10.10.1.152:92) was formed. Members joined: 2 Oct 07 09:06:36 [1335] two corosync notice [QUORUM] Members[1]: 2 Oct 07 09:06:36 [1335] two corosync notice [MAIN ] Completed service synchronization, ready to provide service. Then pmxcfs results in: Oct 07 09:06:38 two pmxcfs[952]: [status] crit: cpg_send_message failed: 9 Oct 07 09:06:38 two pmxcfs[952]: [status] notice: Bad handle 0 Oct 07 09:06:38 two pmxcfs[952]: [status] crit: cpg_send_message failed: 9 Oct 07 09:06:38 two pmxcfs[952]: [status] crit: cpg_send_message failed: 9 Oct 07 09:06:38 two pmxcfs[952]: [status] notice: Bad handle 0 Oct 07 09:06:38 two pmxcfs[952]: [status] crit: cpg_send_message failed: 9 After that the third node joins, but the CS_ERR_BAD_HANDLE stays. Oct 07 09:06:41 [1335] two corosync notice [TOTEM ] A new membership (10.10.1.151:100) was formed. Members joined: 1 Oct 07 09:06:41 [1335] two corosync notice [QUORUM]
Re: [ClusterLabs] STOP cluster after update resource
Hello. We was looking the ways to utilize Corosync/Pacemaker stack for creating a high-availability cluster of PostgreSQL servers with automatic failover. We are using Corosync (2.3.4) as a messaging layer and a stateful master/slave Resource Agent (pgsql) with Pacemaker (1.1.12) on CentOS 7.1. Things work pretty well for a static cluster - where membership is defined up front. However, we needed to be able to seamlessly add new machines (node) to the cluster and remove existing ones from it, without service interruption. And we ran into a problem. Is it possible to add a new node dynamically without interruption? Do you know the way to add new node to cluster without this disruption? Maybe some command or something else? 05.10.2015 13:19, Nikolay Popov пишет: Hello. I have got STOP cluster status when add\del new cluster node after run command: How to add a node without STOP cluster? I am doing command step's: # pcs cluster auth pi01 pi02 pi03 pi05 -u hacluster -p hacluster pi01: Authorized pi02: Authorized pi03: Authorized pi05: Authorized # pcs cluster node add pi05 --start pi01: Corosync updated pi02: Corosync updated pi03: Corosync updated pi05: Succeeded pi05: Starting Cluster... # pcs resource show --full Group: master-group Resource: vip-master (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=192.168.242.100 nic=eth0 cidr_netmask=24 Operations: start interval=0s timeout=60s on-fail=restart (vip-master-start-interval-0s) monitor interval=10s timeout=60s on-fail=restart (vip-master-monitor-interval-10s) stop interval=0s timeout=60s on-fail=block (vip-master-stop-interval-0s) Resource: vip-rep (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=192.168.242.101 nic=eth0 cidr_netmask=24 Meta Attrs: migration-threshold=0 Operations: start interval=0s timeout=60s on-fail=stop (vip-rep-start-interval-0s) monitor interval=10s timeout=60s on-fail=restart (vip-rep-monitor-interval-10s) stop interval=0s timeout=60s on-fail=ignore (vip-rep-stop-interval-0s) Master: msPostgresql Meta Attrs: master-max=1 master-node-max=1 clone-max=3 clone-node-max=1 notify=true Resource: pgsql (class=ocf provider=heartbeat type=pgsql) Attributes: pgctl=/usr/pgsql-9.5/bin/pg_ctl psql=/usr/pgsql-9.5/bin/psql pgdata=/var/lib/pgsql/9.5/data/ rep_mode=sync node_list="pi01 pi02 pi03" restore_command="cp /var/lib/pgsql/9.5/data/wal_archive/%f %p" primary_conninfo_opt="user=repl password=super-pass-for-repl keepalives_idle=60 keepalives_interval=5 keepalives_count=5" master_ip=192.168.242.100 restart_on_promote=true check_wal_receiver=true Operations: start interval=0s timeout=60s on-fail=restart (pgsql-start-interval-0s) monitor interval=4s timeout=60s on-fail=restart (pgsql-monitor-interval-4s) monitor role=Master timeout=60s on-fail=restart interval=3s (pgsql-monitor-interval-3s-role-Master) promote interval=0s timeout=60s on-fail=restart (pgsql-promote-interval-0s) demote interval=0s timeout=60s on-fail=stop (pgsql-demote-interval-0s) stop interval=0s timeout=60s on-fail=block (pgsql-stop-interval-0s) notify interval=0s timeout=60s (pgsql-notify-interval-0s) # pcs resource update msPostgresql pgsql master-max=1 master-node-max=1 clone-max=4 clone-node-max=1 notify=true # pcs resource update pgsql pgsql node_list="pi01 pi02 pi03 pi05" # crm_mon -Afr1 Last updated: Fri Oct 2 17:07:05 2015 Last change: Fri Oct 2 17:06:37 2015 by root via cibadmin on pi01 Stack: corosync Current DC: pi02 (version 1.1.13-a14efad) - partition with quorum 4 nodes and 9 resources configured Online: [ pi01 pi02 pi03 pi05 ] Full list of resources: Resource Group: master-group vip-master (ocf::heartbeat:IPaddr2): Stopped vip-rep(ocf::heartbeat:IPaddr2): Stopped Master/Slave Set: msPostgresql [pgsql] Slaves: [ pi02 ] Stopped: [ pi01 pi03 pi05 ] fence-pi01 (stonith:fence_ssh):Started pi02 fence-pi02 (stonith:fence_ssh):Started pi01 fence-pi03 (stonith:fence_ssh):Started pi01 Node Attributes: * Node pi01: + master-pgsql : -INFINITY + pgsql-data-status : STREAMING|SYNC + pgsql-status : STOP * Node pi02: + master-pgsql : -INFINITY + pgsql-data-status : LATEST + pgsql-status : STOP * Node pi03: + master-pgsql : -INFINITY + pgsql-data-status : STREAMING|POTENTIAL + pgsql-status : STOP * Node pi05: + master-pgsql : -INFINITY + pgsql-status : STOP Migration Summary: * Node pi01: * Node pi03: * Node pi02: * Node pi05: After some time is worked: Every 2.0s: crm_mon -Afr1
[ClusterLabs] parallel execution of resources
hi, I have around 30 services running in each node of a 2 node cluster. when more than one resource fails pacemaker tries to restart the resources but does it in a sequential way. is it possible for pacemaker to start resources parallely? are there situations where pacemaker can reboot a node automatically? I was working on a node and suddenly got rebooted. I am having a thought that pacemaker could be the reason. is the node rebooted because of pacemaker? thanks and regards p.vijay ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] group resources not grouped ?!?
Hi, i got a problem i don't understand, maybe someone can give me a hint. My 2-node cluster (named ali and baba) is configured to run mysql, an IP for mysql and the filesystem resource (on drbd master) together as a GROUP. After doing some crash-tests i ended up having filesystem and mysql running happily on one host (ali), and the related IP on the other (baba) although, the IP's not really up and running, crm_mon just SHOWS it as started there. In fact it's nowhere up, neither on ali nor on baba. crm_mon shows that pacemaker tried to start it on baba, but gave up after fail-count=100. Q1: why doesn't pacemaker put the IP on ali, where all the rest of it's group lives? Q2: why doesn't pacemaker try to start the IP on ali, after max failcount had been reached on baba? Q3: why is crm_mon showing the IP as "started", when it's down after 10 tries? Thanks :) config (some parts removed): --- node ali node baba primitive res_drbd ocf:linbit:drbd \ params drbd_resource="r0" \ op stop interval="0" timeout="100" \ op start interval="0" timeout="240" \ op promote interval="0" timeout="90" \ op demote interval="0" timeout="90" \ op notify interval="0" timeout="90" \ op monitor interval="40" role="Slave" timeout="20" \ op monitor interval="20" role="Master" timeout="20" primitive res_fs ocf:heartbeat:Filesystem \ params device="/dev/drbd0" directory="/drbd_mnt" fstype="ext4" \ op monitor interval="30s" primitive res_hamysql_ip ocf:heartbeat:IPaddr2 \ params ip="XXX.XXX.XXX.224" nic="eth0" cidr_netmask="23" \ op monitor interval="10s" timeout="20s" depth="0" primitive res_mysql lsb:mysql \ op start interval="0" timeout="15" \ op stop interval="0" timeout="15" \ op monitor start-delay="30" interval="15" time-out="15" group gr_mysqlgroup res_fs res_mysql res_hamysql_ip \ meta target-role="Started" ms ms_drbd res_drbd \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" colocation col_fs_on_drbd_master inf: res_fs:Started ms_drbd:Master order ord_drbd_master_then_fs inf: ms_drbd:promote res_fs:start property $id="cib-bootstrap-options" \ dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \ cluster-infrastructure="openais" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ expected-quorum-votes="2" \ last-lrm-refresh="1438857246" crm_mon -rnf (some parts removed): - Node ali: online res_fs (ocf::heartbeat:Filesystem) Started res_mysql (lsb:mysql) Started res_drbd:0 (ocf::linbit:drbd) Master Node baba: online res_hamysql_ip (ocf::heartbeat:IPaddr2) Started res_drbd:1 (ocf::linbit:drbd) Slave Inactive resources: Migration summary: * Node baba: res_hamysql_ip: migration-threshold=100 fail-count=100 Failed actions: res_hamysql_ip_stop_0 (node=a891vl107s, call=35, rc=1, status=complete): unknown error corosync.log: -- pengine: [1223]: WARN: should_dump_input: Ignoring requirement that res_hamysql_ip_stop_0 comeplete before gr_mysqlgroup_stopped_0: unmanaged failed resources cannot prevent shutdown pengine: [1223]: WARN: should_dump_input: Ignoring requirement that res_hamysql_ip_stop_0 comeplete before gr_mysqlgroup_stopped_0: unmanaged failed resources cannot prevent shutdown Software: -- corosync 1.2.1-4 pacemaker 1.0.9.1+hg15626-1 drbd8-utils 2:8.3.7-2.1 (for some reason it's not possible to update at this time) ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: parallel execution of resources
>>> Vijay Partha schrieb am 07.10.2015 um 11:39 in Nachricht : > hi, > > I have around 30 services running in each node of a 2 node cluster. > when more than one resource fails pacemaker tries to restart the resources > but does it in a sequential way. is it possible for pacemaker to start > resources parallely? Normally it does (while following ordering constraints). > > are there situations where pacemaker can reboot a node automatically? I > was working on a node and suddenly got rebooted. I am having a thought that > pacemaker could be the reason. is the node rebooted because of pacemaker? It's called "fencing". Maybe have a look at yur syslog or cluster logs Regards, Ulrich > > thanks and regards > p.vijay ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: group resources not grouped ?!?
>>> zulucloud schrieb am 07.10.2015 um 16:12 in >>> Nachricht <5615284e.8050...@mailbox.org>: > Hi, > i got a problem i don't understand, maybe someone can give me a hint. > > My 2-node cluster (named ali and baba) is configured to run mysql, an IP > for mysql and the filesystem resource (on drbd master) together as a > GROUP. After doing some crash-tests i ended up having filesystem and > mysql running happily on one host (ali), and the related IP on the other > (baba) although, the IP's not really up and running, crm_mon just > SHOWS it as started there. In fact it's nowhere up, neither on ali nor > on baba. Then it's most likely a bug in the resource agent. To make sure, try "crm resource reprobe" and be patient after that for some seconds. Then recheck the displayed status. > > crm_mon shows that pacemaker tried to start it on baba, but gave up > after fail-count=100. This could mean: Multiple start attempty failed, as did stop attempts, so the cluster thinks it might be running. It looks very much like a configuration problem to me. > > Q1: why doesn't pacemaker put the IP on ali, where all the rest of it's > group lives? See the log files in detail. > Q2: why doesn't pacemaker try to start the IP on ali, after max > failcount had been reached on baba? Do you have fencing enabled? > Q3: why is crm_mon showing the IP as "started", when it's down after > 10 tries? See above. > > Thanks :) 8-) > > > config (some parts removed): > --- > node ali > node baba > > primitive res_drbd ocf:linbit:drbd \ > params drbd_resource="r0" \ > op stop interval="0" timeout="100" \ > op start interval="0" timeout="240" \ > op promote interval="0" timeout="90" \ > op demote interval="0" timeout="90" \ > op notify interval="0" timeout="90" \ > op monitor interval="40" role="Slave" timeout="20" \ > op monitor interval="20" role="Master" timeout="20" > primitive res_fs ocf:heartbeat:Filesystem \ > params device="/dev/drbd0" directory="/drbd_mnt" fstype="ext4" \ > op monitor interval="30s" > primitive res_hamysql_ip ocf:heartbeat:IPaddr2 \ > params ip="XXX.XXX.XXX.224" nic="eth0" cidr_netmask="23" \ > op monitor interval="10s" timeout="20s" depth="0" > primitive res_mysql lsb:mysql \ > op start interval="0" timeout="15" \ > op stop interval="0" timeout="15" \ > op monitor start-delay="30" interval="15" time-out="15" > > group gr_mysqlgroup res_fs res_mysql res_hamysql_ip \ > meta target-role="Started" > ms ms_drbd res_drbd \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" > > colocation col_fs_on_drbd_master inf: res_fs:Started ms_drbd:Master > > order ord_drbd_master_then_fs inf: ms_drbd:promote res_fs:start > > property $id="cib-bootstrap-options" \ > dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \ > cluster-infrastructure="openais" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" \ > expected-quorum-votes="2" \ > last-lrm-refresh="1438857246" > > > crm_mon -rnf (some parts removed): > - > Node ali: online > res_fs (ocf::heartbeat:Filesystem) Started > res_mysql (lsb:mysql) Started > res_drbd:0 (ocf::linbit:drbd) Master > Node baba: online > res_hamysql_ip (ocf::heartbeat:IPaddr2) Started > res_drbd:1 (ocf::linbit:drbd) Slave > > Inactive resources: > > Migration summary: > > * Node baba: > res_hamysql_ip: migration-threshold=100 fail-count=100 > > Failed actions: > res_hamysql_ip_stop_0 (node=a891vl107s, call=35, rc=1, > status=complete): unknown error > > corosync.log: > -- > pengine: [1223]: WARN: should_dump_input: Ignoring requirement that > res_hamysql_ip_stop_0 comeplete before gr_mysqlgroup_stopped_0: > unmanaged failed resources cannot prevent shutdown > > pengine: [1223]: WARN: should_dump_input: Ignoring requirement that > res_hamysql_ip_stop_0 comeplete before gr_mysqlgroup_stopped_0: > unmanaged failed resources cannot prevent shutdown > > Software: > -- > corosync 1.2.1-4 > pacemaker 1.0.9.1+hg15626-1 > drbd8-utils 2:8.3.7-2.1 > (for some reason it's not possible to update at this time) > > > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: group resources not grouped ?!?
On 10/07/2015 04:46 PM, Ulrich Windl wrote: zulucloud schrieb am 07.10.2015 um 16:12 in Nachricht <5615284e.8050...@mailbox.org>: Hi, i got a problem i don't understand, maybe someone can give me a hint. My 2-node cluster (named ali and baba) is configured to run mysql, an IP for mysql and the filesystem resource (on drbd master) together as a GROUP. After doing some crash-tests i ended up having filesystem and mysql running happily on one host (ali), and the related IP on the other (baba) although, the IP's not really up and running, crm_mon just SHOWS it as started there. In fact it's nowhere up, neither on ali nor on baba. Then it's most likely a bug in the resource agent. To make sure, try "crm resource reprobe" and be patient after that for some seconds. Then recheck the displayed status. In the meantime i already did a "resource cleanup res_hamysql_ip". The failcounts etc. disappeared. After that a "start gr_mysqlgroup" started everything without a hassle on the correct node. crm_mon shows that pacemaker tried to start it on baba, but gave up after fail-count=100. This could mean: Multiple start attempty failed, as did stop attempts, so the cluster thinks it might be running. It looks very much like a configuration problem to me. Q1: why doesn't pacemaker put the IP on ali, where all the rest of it's group lives? See the log files in detail. Well, they're quite verbose and a little bit cryptic...;) I didn't find anything what could enlighten that for me... Q2: why doesn't pacemaker try to start the IP on ali, after max failcount had been reached on baba? Do you have fencing enabled? No. These are 2 virtual machines running together with some other vm's on 2 physical VMWare servers. Could you give me a suggestion on how to implement fencing in that situation? thx ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: crm_report consumes all available RAM
On Tue, Oct 06, 2015 at 11:50:00PM +0200, Jan Pokorný wrote: > On 06/10/15 10:28 +0200, Dejan Muhamedagic wrote: > > On Mon, Oct 05, 2015 at 07:00:18PM +0300, Vladislav Bogdanov wrote: > >> 14.09.2015 02:31, Andrew Beekhof wrote: > >>> > On 8 Sep 2015, at 10:18 pm, Ulrich Windl > wrote: > > >>> Vladislav Bogdanov schrieb am 08.09.2015 um > >>> 14:05 in > Nachricht <55eecefb.8050...@hoster-ok.com>: > > Hi, > > > > just discovered very interesting issue. > > If there is a system user with very big UID (8002 in my case), > > then crm_report (actually 'grep' it runs) consumes too much RAM. > > > > Relevant part of the process tree at that moment looks like (word-wrap > > off): > > USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND > > ... > > root 25526 0.0 0.0 106364 636 ?S12:37 0:00 > > \_ > > /bin/sh /usr/sbin/crm_report --dest=/var/log/crm_report -f -01-01 > > 00:00:00 > > root 25585 0.0 0.0 106364 636 ?S12:37 0:00 > > \_ bash /var/log/crm_report/collector > > root 25613 0.0 0.0 106364 152 ?S12:37 0:00 > > \_ bash /var/log/crm_report/collector > > root 25614 0.0 0.0 106364 692 ?S12:37 0:00 > > \_ bash /var/log/crm_report/collector > > root 27965 4.9 0.0 100936 452 ?S12:38 0:01 > > | \_ cat /var/log/lastlog > > root 27966 23.0 82.9 3248996 1594688 ? D12:38 0:08 > > | \_ grep -l -e Starting Pacemaker Whoa. grep using up 1.5 gig resident (3.2 gig virtual) still looking for the first newline. I suggest in addition to the (good) suggestions so far, to also set a ulimit. 1) export LC_ALL=C so grep won't take quadratic time trying to make sure it understands unicode correctly; yes, I'm sure that bug has been fix on most systems meanwhile... 2) ( ulimit -v 10 ; grep ) Usually, even with "very many very long lines", my grep stays below a few (~3) megabyte. A limit of 100M seems to be way too much, but if it thinks it needs that much RAM to find a short string, then we are very likely not interested in that file. -- : Lars Ellenberg : http://www.LINBIT.com | Your Way to High Availability : DRBD, Linux-HA and Pacemaker support and consulting DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: crm_report consumes all available RAM
On Tue, Oct 06, 2015 at 11:50:00PM +0200, Jan Pokorný wrote: > On 06/10/15 10:28 +0200, Dejan Muhamedagic wrote: > > On Mon, Oct 05, 2015 at 07:00:18PM +0300, Vladislav Bogdanov wrote: > >> 14.09.2015 02:31, Andrew Beekhof wrote: > >>> > On 8 Sep 2015, at 10:18 pm, Ulrich Windl > wrote: > > >>> Vladislav Bogdanov schrieb am 08.09.2015 um > >>> 14:05 in > Nachricht <55eecefb.8050...@hoster-ok.com>: > > Hi, > > > > just discovered very interesting issue. > > If there is a system user with very big UID (8002 in my case), > > then crm_report (actually 'grep' it runs) consumes too much RAM. > > > > Relevant part of the process tree at that moment looks like (word-wrap > > off): > > USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND > > ... > > root 25526 0.0 0.0 106364 636 ?S12:37 0:00 > > \_ > > /bin/sh /usr/sbin/crm_report --dest=/var/log/crm_report -f -01-01 > > 00:00:00 > > root 25585 0.0 0.0 106364 636 ?S12:37 0:00 > > \_ bash /var/log/crm_report/collector > > root 25613 0.0 0.0 106364 152 ?S12:37 0:00 > > \_ bash /var/log/crm_report/collector > > root 25614 0.0 0.0 106364 692 ?S12:37 0:00 > > \_ bash /var/log/crm_report/collector > > root 27965 4.9 0.0 100936 452 ?S12:38 0:01 > > | \_ cat /var/log/lastlog > > root 27966 23.0 82.9 3248996 1594688 ? D12:38 0:08 > > | \_ grep -l -e Starting Pacemaker > > root 25615 0.0 0.0 155432 600 ?S12:37 0:00 > > \_ sort -u > > > > ls -ls /var/log/lastlog shows: > > 40 -rw-r--r--. 1 root root 2336876 Sep 8 04:36 /var/log/lastlog > > > > That is sparse binary file, which consumes only 40k of disk space. > > At the same time its size is 23GB, and grep takes all the RAM trying to > > grep a string from a 23GB of mostly zeroes without new-lines. > > > > I believe this is worth fixing, > >>> > >>> Shouldn’t this be directed to the grep folks? > >> > >> Actually, not everything in /var/log are textual logs. Currently > >> findmsg() [z,bz,xz]cats _every_ file there and greps for a pattern. > >> Shouldn't it skip some well-known ones? btmp, lastlog and wtmp are > >> good candidates to be skipped. They are not intended to be handled > >> as a text. > >> > >> Or may be just test that file is a text in a find_decompressor() and > >> to not cat it if it is not? > >> > >> something like > >> find_decompressor() { > >> if echo $1 | grep -qs 'bz2$'; then > >> echo "bzip2 -dc" > >> elif echo $1 | grep -qs 'gz$'; then > >> echo "gzip -dc" > >> elif echo $1 | grep -qs 'xz$'; then > >> echo "xz -dc" > >> elif file $1 | grep -qs 'text'; then > >> echo "cat" > >> else > >> echo "echo" > > > > Good idea. > > Even better might be using process substitution and avoid cat'ing if > not needed even for plain text files, assuming GNU grep 2.13+ that, > in combination with kernel, attempts to detect sparse files, marking > them as binary files[1], which can then be utilized in combination > with -I option. Something like the below, maybe. Untested direct-to-email PoC code. if echo . | grep -q -I . 2>/dev/null; then have_grep_dash_I=true else have_grep_dash_I=false fi # similar checks can be made for other decompressors mygrep() { ( # sub shell for ulimit # ulimit -v ... but maybe someone wants to mmap a huge file, # and limiting the virtual size cripples mmap unnecessarily, # so let's limit resident size instead. Let's be generous, when # decompressing stuff that was compressed with xz -9, we may # need ~65 MB according to my man page, and if it was generated # by something else, the decompressor may need even more. # Grep itself should not use much more than single digit MB, # so if the pipeline below needs more than 200 MB resident, # we probably are not interested in that file in any case. # ulimit -m 20 # Actually no need for "local" anymore, # this is a subshell already. Just a habbit. local file=$1 case $file in *.bz2) bzgrep "$file";; # or bzip2 -dc | grep, if you prefer *.gz) zgrep "$file";; *.xz) xzgrep "$file";; # ... *) local file_type=$(file "$file") case $file_type in *text*) grep "$file" ;; *) # try anyways, let grep use its own heuristic $have_grep_dash_I && grep --binary-files=without-match "$file" ;; esac ;; esac ) } -- : Lars Ellenberg : http
Re: [ClusterLabs] Antw: crm_report consumes all available RAM
On Wed, Oct 07, 2015 at 05:39:01PM +0200, Lars Ellenberg wrote: > Something like the below, maybe. > Untested direct-to-email PoC code. > > if echo . | grep -q -I . 2>/dev/null; then > have_grep_dash_I=true > else > have_grep_dash_I=false > fi > # similar checks can be made for other decompressors > > mygrep() > { > ( > # sub shell for ulimit > > # ulimit -v ... but maybe someone wants to mmap a huge file, > # and limiting the virtual size cripples mmap unnecessarily, > # so let's limit resident size instead. Let's be generous, when > # decompressing stuff that was compressed with xz -9, we may > # need ~65 MB according to my man page, and if it was generated > # by something else, the decompressor may need even more. > # Grep itself should not use much more than single digit MB, > # so if the pipeline below needs more than 200 MB resident, > # we probably are not interested in that file in any case. > # > ulimit -m 20 Bah. scratch that. RLIMIT_RSS No longer has any effect on linux 2.6. so we are back to ulimit -v 20 > > # Actually no need for "local" anymore, > # this is a subshell already. Just a habbit. > > local file=$1 > case $file in > *.bz2) bzgrep "$file";; # or bzip2 -dc | grep, if you prefer > *.gz) zgrep "$file";; > *.xz) xzgrep "$file";; > # ... > *) > local file_type=$(file "$file") > case $file_type in > *text*) > grep "$file" ;; > *) > # try anyways, let grep use its own heuristic > $have_grep_dash_I && grep --binary-files=without-match > "$file" ;; > esac ;; > esac > ) > } -- : Lars Ellenberg : http://www.LINBIT.com | Your Way to High Availability : DRBD, Linux-HA and Pacemaker support and consulting DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] group resources not grouped ?!?
On 10/07/2015 09:12 AM, zulucloud wrote: > Hi, > i got a problem i don't understand, maybe someone can give me a hint. > > My 2-node cluster (named ali and baba) is configured to run mysql, an IP > for mysql and the filesystem resource (on drbd master) together as a > GROUP. After doing some crash-tests i ended up having filesystem and > mysql running happily on one host (ali), and the related IP on the other > (baba) although, the IP's not really up and running, crm_mon just > SHOWS it as started there. In fact it's nowhere up, neither on ali nor > on baba. > > crm_mon shows that pacemaker tried to start it on baba, but gave up > after fail-count=100. > > Q1: why doesn't pacemaker put the IP on ali, where all the rest of it's > group lives? > Q2: why doesn't pacemaker try to start the IP on ali, after max > failcount had been reached on baba? > Q3: why is crm_mon showing the IP as "started", when it's down after > 10 tries? > > Thanks :) > > > config (some parts removed): > --- > node ali > node baba > > primitive res_drbd ocf:linbit:drbd \ > params drbd_resource="r0" \ > op stop interval="0" timeout="100" \ > op start interval="0" timeout="240" \ > op promote interval="0" timeout="90" \ > op demote interval="0" timeout="90" \ > op notify interval="0" timeout="90" \ > op monitor interval="40" role="Slave" timeout="20" \ > op monitor interval="20" role="Master" timeout="20" > primitive res_fs ocf:heartbeat:Filesystem \ > params device="/dev/drbd0" directory="/drbd_mnt" fstype="ext4" \ > op monitor interval="30s" > primitive res_hamysql_ip ocf:heartbeat:IPaddr2 \ > params ip="XXX.XXX.XXX.224" nic="eth0" cidr_netmask="23" \ > op monitor interval="10s" timeout="20s" depth="0" > primitive res_mysql lsb:mysql \ > op start interval="0" timeout="15" \ > op stop interval="0" timeout="15" \ > op monitor start-delay="30" interval="15" time-out="15" > > group gr_mysqlgroup res_fs res_mysql res_hamysql_ip \ > meta target-role="Started" > ms ms_drbd res_drbd \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" > > colocation col_fs_on_drbd_master inf: res_fs:Started ms_drbd:Master > > order ord_drbd_master_then_fs inf: ms_drbd:promote res_fs:start > > property $id="cib-bootstrap-options" \ > dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \ > cluster-infrastructure="openais" \ > stonith-enabled="false" \ Not having stonith is part of the problem (see below). Without stonith, if the two nodes go into split brain (both up but can't communicate with each other), Pacemaker will try to promote DRBD to master on both nodes, mount the filesystem on both nodes, and start MySQL on both nodes. > no-quorum-policy="ignore" \ > expected-quorum-votes="2" \ > last-lrm-refresh="1438857246" > > > crm_mon -rnf (some parts removed): > - > Node ali: online > res_fs (ocf::heartbeat:Filesystem) Started > res_mysql (lsb:mysql) Started > res_drbd:0 (ocf::linbit:drbd) Master > Node baba: online > res_hamysql_ip (ocf::heartbeat:IPaddr2) Started > res_drbd:1 (ocf::linbit:drbd) Slave > > Inactive resources: > > Migration summary: > > * Node baba: >res_hamysql_ip: migration-threshold=100 fail-count=100 > > Failed actions: > res_hamysql_ip_stop_0 (node=a891vl107s, call=35, rc=1, > status=complete): unknown error The "_stop_" above means that a *stop* action on the IP failed. Pacemaker tried to migrate the IP by first stopping it on baba, but it couldn't. (Since the IP is the last member of the group, its failure didn't prevent the other members from moving.) Normally, when a stop fails, Pacemaker fences the node so it can safely bring up the resource on the other node. But you disabled stonith, so it got into this state. So, to proceed: 1) Stonith would help :) 2) Figure out why it couldn't stop the IP. There might be a clue in the logs on baba (though they are indeed hard to follow; search for "res_hamysql_stop_0" around this time, and look around there). You could also try adding and removing the IP manually, first with the usual OS commands, and if that works, by calling the IP resource agent directly. That often turns up the problem. > > corosync.log: > -- > pengine: [1223]: WARN: should_dump_input: Ignoring requirement that > res_hamysql_ip_stop_0 comeplete before gr_mysqlgroup_stopped_0: > unmanaged failed resources cannot prevent shutdown > > pengine: [1223]: WARN: should_dump_input: Ignoring requirement that > res_hamysql_ip_stop_0 comeplete before gr_mysqlgroup_stopped_0: > unmanaged failed resources cannot prevent shutdown > > Software: > -- > corosync 1.2.1-4 > pacemaker 1.0.9.1+hg15626-1 > drbd8-utils 2:8.3.7-2.1 > (for some reason it's not possible to update at this time) It should be possible to get