Re: [Pacemaker] Filesystem resource killing innocent processes on stop
On Mon, May 18, 2015 at 07:34:38PM +0300, Vladislav Bogdanov wrote: 18.05.2015 18:57, Nikola Ciprich wrote: Hi Vladislav, Isn't that a bind-mount? nope, but your question lead me to possible culprit.. it's cephfs mount, when I try to some local filesystem, I don't see this weird fuser behaviour.. so maybe fuser does not work correctly on cephfs? yep, for bind-mounts fuser shows/kills both processes which use bounded tree and original filesystem. this is how fs is mounted: 10.0.0.1,10.0.0.2,10.0.0.3:/ on /home/cluster/virt type ceph (name=admin,key=client.admin) I should probably ask in ceph maillist.. There are alternative ways to determine mountpoint usage btw. One of them is lsof, I use it for bind-mounts. The Filesystem RA supports bind mounts. Is there a problem then with it using fuser? Thanks, Dejan n. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Filesystem resource killing innocent processes on stop
On Mon, May 18, 2015 at 05:14:14PM +0200, Nikola Ciprich wrote: Hi Dejan, The list below seems too extensive. Which version of resource-agents do you run? $ grep 'Build version:' /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs yes, it's definitely wrong.. here's the info you've requested: # Build version: 5434e9646462d2c3c8f7aad2609d0ef1875839c7 rpm version: resource-agents-3.9.5-12.el6_6.5.x86_64 I can already see the problem, this version simply uses fuser -m $MOUNTPOINT which seems to return pretty wrong results: [root@denovav1b ~]# fuser -m /home/cluster/virt/ /home/cluster/virt/: 1m 3295m 3314m 4817m 4846m 4847m 4890m 4891m 4916m 4944m 4952m 4999m 5007m 5037m 5069m 5137m 5162m 5164m 5166m 5168m 5170m 5172m 5575m 8055m 9604m 9605m 10984m 11186m 11370m 11813m 11871m 11887m 11946m 12020m 12026m 12027m 12028m 12029m 12030m 12031m 14218m 15294m 15374m 15396m 15399m 17479m 17693m 17694m 20705m 20718m 20948m 20982m 23902m 24572m 24580m 26300m 29790m 29792m 30785m (notice even process # 1!) while lsof returns: lsof | grep cluster.*virt qemu-syst 8055 root 21r REG0,0 232783872 1099511634304 /home/cluster/virt/images/debian-7.8.0-amd64-netinst.iso which seems much saner to me.. Indeed. Is fuser broken or is there some kernel side confusion? Did you also try: lsof /home/cluster/virt/ Anyway, it would be good to bring this up with the centos people. Thanks, Dejan BR nik here's example of the log: Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 3606 1 0 Feb12 ?Ss0:01 /sbin/udevd -d Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4249 1 0 Feb12 ttyS2Ss+0:00 agetty ttyS2 115200 vt100 Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4271 4395 0 21:58 ?Ss 0:00 sshd: root@pts/12 Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4273 1 0 21:58 ?Rs 0:00 [bash] Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4395 1 0 Feb24 ?Ss 0:03 /usr/sbin/sshd Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4677 1 0 Feb12 ?Ss 0:00 /sbin/portreserve Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4690 1 0 Feb12 ?S 0:00 supervising syslog-ng Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4691 1 0 Feb12 ?Ss 0:46 syslog-ng -p /var/run/syslog-ng.pid Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpc 4746 1 0 Feb12 ?Ss 0:05 rpcbind Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpcuser 4764 1 0 Feb12 ?Ss 0:00 rpc.statd Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4797 1 0 Feb12 ?Ss 0:00 rpc.idmapd Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4803 12028 0 21:59 ?S 0:00 /bin/sh /usr/lib/ocf/resource.d/heartbeat/Filesystem stop while unmounting /home/cluster/virt directory.. what is quite curious, is, that last killed process seems to be Filesystem resource itself.. Hmm, that's quite strange. That implies that the RA script itself had /home/cluster/virt as its WD. before I dig deeper into this, did anyone else noticed this problem? Is this some known (and possibly already issue)? Never heard of this. Thanks, Dejan thanks a lot in advance nik -- - Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28.rijna 168, 709 00 Ostrava tel.: +420 591 166 214 fax:+420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: ser...@linuxbox.cz - ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Filesystem resource killing innocent processes on stop
19.05.2015 11:46, Dejan Muhamedagic wrote: On Mon, May 18, 2015 at 07:34:38PM +0300, Vladislav Bogdanov wrote: 18.05.2015 18:57, Nikola Ciprich wrote: Hi Vladislav, Isn't that a bind-mount? nope, but your question lead me to possible culprit.. it's cephfs mount, when I try to some local filesystem, I don't see this weird fuser behaviour.. so maybe fuser does not work correctly on cephfs? yep, for bind-mounts fuser shows/kills both processes which use bounded tree and original filesystem. this is how fs is mounted: 10.0.0.1,10.0.0.2,10.0.0.3:/ on /home/cluster/virt type ceph (name=admin,key=client.admin) I should probably ask in ceph maillist.. There are alternative ways to determine mountpoint usage btw. One of them is lsof, I use it for bind-mounts. The Filesystem RA supports bind mounts. Is there a problem then with it using fuser? Definitely (but may be kernel/fuser version specific). Thanks, Dejan n. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Filesystem resource killing innocent processes on stop
19.05.2015 11:44, Dejan Muhamedagic wrote: On Mon, May 18, 2015 at 05:14:14PM +0200, Nikola Ciprich wrote: Hi Dejan, The list below seems too extensive. Which version of resource-agents do you run? $ grep 'Build version:' /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs yes, it's definitely wrong.. here's the info you've requested: # Build version: 5434e9646462d2c3c8f7aad2609d0ef1875839c7 rpm version: resource-agents-3.9.5-12.el6_6.5.x86_64 I can already see the problem, this version simply uses fuser -m $MOUNTPOINT which seems to return pretty wrong results: [root@denovav1b ~]# fuser -m /home/cluster/virt/ /home/cluster/virt/: 1m 3295m 3314m 4817m 4846m 4847m 4890m 4891m 4916m 4944m 4952m 4999m 5007m 5037m 5069m 5137m 5162m 5164m 5166m 5168m 5170m 5172m 5575m 8055m 9604m 9605m 10984m 11186m 11370m 11813m 11871m 11887m 11946m 12020m 12026m 12027m 12028m 12029m 12030m 12031m 14218m 15294m 15374m 15396m 15399m 17479m 17693m 17694m 20705m 20718m 20948m 20982m 23902m 24572m 24580m 26300m 29790m 29792m 30785m (notice even process # 1!) while lsof returns: lsof | grep cluster.*virt qemu-syst 8055 root 21r REG0,0 232783872 1099511634304 /home/cluster/virt/images/debian-7.8.0-amd64-netinst.iso which seems much saner to me.. Indeed. Is fuser broken or is there some kernel side confusion? As far as was able to investigate, that comes from the fact that fuser uses device field which is the same for source and bind mount (yes, that is centos6). Did you also try: lsof /home/cluster/virt/ Anyway, it would be good to bring this up with the centos people. Thanks, Dejan BR nik here's example of the log: Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 3606 1 0 Feb12 ?Ss0:01 /sbin/udevd -d Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4249 1 0 Feb12 ttyS2Ss+0:00 agetty ttyS2 115200 vt100 Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4271 4395 0 21:58 ?Ss 0:00 sshd: root@pts/12 Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4273 1 0 21:58 ?Rs 0:00 [bash] Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4395 1 0 Feb24 ?Ss 0:03 /usr/sbin/sshd Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4677 1 0 Feb12 ?Ss 0:00 /sbin/portreserve Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4690 1 0 Feb12 ?S 0:00 supervising syslog-ng Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4691 1 0 Feb12 ?Ss 0:46 syslog-ng -p /var/run/syslog-ng.pid Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpc 4746 1 0 Feb12 ?Ss 0:05 rpcbind Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpcuser 4764 1 0 Feb12 ?Ss 0:00 rpc.statd Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4797 1 0 Feb12 ?Ss 0:00 rpc.idmapd Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4803 12028 0 21:59 ?S 0:00 /bin/sh /usr/lib/ocf/resource.d/heartbeat/Filesystem stop while unmounting /home/cluster/virt directory.. what is quite curious, is, that last killed process seems to be Filesystem resource itself.. Hmm, that's quite strange. That implies that the RA script itself had /home/cluster/virt as its WD. before I dig deeper into this, did anyone else noticed this problem? Is this some known (and possibly already issue)? Never heard of this. Thanks, Dejan thanks a lot in advance nik -- - Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28.rijna 168, 709 00 Ostrava tel.: +420 591 166 214 fax:+420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: ser...@linuxbox.cz - ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- - Ing. Nikola CIPRICH
Re: [Pacemaker] Filesystem resource killing innocent processes on stop
Hi Nikola, I wish I could help, but I am not using Pacemaker for 3 years now, sorry. I just wanted to thank you for the E-mail subject, it drew a big smile on my face after a long tiresome not-so-good day. Really, thank you :) Best regards, Angie Tawfik Am 18.05.2015 13:31 schrieb Nikola Ciprich nikola.cipr...@linuxbox.cz: Hi, I noticed very annoying bug (or so I think), that resource-agents-3.9.5 in RHEL / centos 6 Filesystem OCF resource seems to be killing completely unrelated processes on shutdown although they're not using anything on mounted filesystem... unfortunately, one of processes very often killed is sshd :-( here's example of the log: Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 3606 1 0 Feb12 ?Ss0:01 /sbin/udevd -d Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4249 1 0 Feb12 ttyS2Ss+0:00 agetty ttyS2 115200 vt100 Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4271 4395 0 21:58 ?Ss 0:00 sshd: root@pts /12 Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4273 1 0 21:58 ?Rs 0:00 [bash] Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4395 1 0 Feb24 ?Ss 0:03 /usr/sbin/sshd Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4677 1 0 Feb12 ?Ss 0:00 /sbin/portreserve Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4690 1 0 Feb12 ?S 0:00 supervising syslog-ng Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4691 1 0 Feb12 ?Ss 0:46 syslog-ng -p /var/run/syslog-ng.pid Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpc 4746 1 0 Feb12 ?Ss 0:05 rpcbind Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpcuser 4764 1 0 Feb12 ?Ss 0:00 rpc.statd Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4797 1 0 Feb12 ?Ss 0:00 rpc.idmapd Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4803 12028 0 21:59 ?S 0:00 /bin/sh /usr/lib/ocf/resource.d/heartbeat/Filesystem stop while unmounting /home/cluster/virt directory.. what is quite curious, is, that last killed process seems to be Filesystem resource itself.. before I dig deeper into this, did anyone else noticed this problem? Is this some known (and possibly already issue)? thanks a lot in advance nik -- - Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28.rijna 168, 709 00 Ostrava tel.: +420 591 166 214 fax:+420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: ser...@linuxbox.cz - ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Filesystem resource killing innocent processes on stop
Hi, On Mon, May 18, 2015 at 12:20:38PM +0200, Nikola Ciprich wrote: Hi, I noticed very annoying bug (or so I think), that resource-agents-3.9.5 in RHEL / centos 6 Filesystem OCF resource seems to be killing completely unrelated processes on shutdown although they're not using anything on mounted filesystem... unfortunately, one of processes very often killed is sshd :-( The list below seems too extensive. Which version of resource-agents do you run? $ grep 'Build version:' /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs here's example of the log: Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 3606 1 0 Feb12 ?Ss0:01 /sbin/udevd -d Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4249 1 0 Feb12 ttyS2Ss+0:00 agetty ttyS2 115200 vt100 Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4271 4395 0 21:58 ?Ss 0:00 sshd: root@pts/12 Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4273 1 0 21:58 ?Rs 0:00 [bash] Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4395 1 0 Feb24 ?Ss 0:03 /usr/sbin/sshd Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4677 1 0 Feb12 ?Ss 0:00 /sbin/portreserve Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4690 1 0 Feb12 ?S 0:00 supervising syslog-ng Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4691 1 0 Feb12 ?Ss 0:46 syslog-ng -p /var/run/syslog-ng.pid Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpc 4746 1 0 Feb12 ?Ss 0:05 rpcbind Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpcuser 4764 1 0 Feb12 ?Ss 0:00 rpc.statd Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4797 1 0 Feb12 ?Ss 0:00 rpc.idmapd Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4803 12028 0 21:59 ?S 0:00 /bin/sh /usr/lib/ocf/resource.d/heartbeat/Filesystem stop while unmounting /home/cluster/virt directory.. what is quite curious, is, that last killed process seems to be Filesystem resource itself.. Hmm, that's quite strange. That implies that the RA script itself had /home/cluster/virt as its WD. before I dig deeper into this, did anyone else noticed this problem? Is this some known (and possibly already issue)? Never heard of this. Thanks, Dejan thanks a lot in advance nik -- - Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28.rijna 168, 709 00 Ostrava tel.: +420 591 166 214 fax:+420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: ser...@linuxbox.cz - ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Filesystem resource killing innocent processes on stop
are you sure you process are not in working directory /home/cluster/virt ? I'm using suse 11 Sp2 and I don't know if the agent is the same in redhat 6, but i think so, anyway for umounting the fs the script uses the following functions Filesystem_stop - fs_stop - signal_processes In the fs_stop function, the cluster try to kill the process that using the fs with TERM signal fs_stop() { local SUB=$1 timeout=$2 sig cnt for sig in TERM KILL; do cnt=$((timeout/2)) # try half time with TERM while [ $cnt -gt 0 ]; do try_umount $SUB return $OCF_SUCCESS ocf_log err Couldn't unmount $SUB; trying cleanup with $sig signal_processes $SUB $sig cnt=$((cnt-1)) sleep 1 done done return $OCF_ERR_GENERIC } In function signal_processes, the cluster uses fuser to kill the process signal_processes() { local dir=$1 local sig=$2 # fuser returns a non-zero return code if none of the # specified files is accessed or in case of a fatal # error. if [ X${HOSTOS} = XOpenBSD ];then PIDS=`fstat | grep $dir | awk '{print $3}'` for PID in ${PIDS};do kill -s $sig ${PID} ocf_log info Sent signal $sig to ${PID} done else if $FUSER -$sig -m -k $dir ; then ocf_log info Some processes on $dir were signalled else ocf_log info No processes on $dir were signalled fi fi } 2015-05-18 12:20 GMT+02:00 Nikola Ciprich nikola.cipr...@linuxbox.cz: Hi, I noticed very annoying bug (or so I think), that resource-agents-3.9.5 in RHEL / centos 6 Filesystem OCF resource seems to be killing completely unrelated processes on shutdown although they're not using anything on mounted filesystem... unfortunately, one of processes very often killed is sshd :-( here's example of the log: Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 3606 1 0 Feb12 ?Ss0:01 /sbin/udevd -d Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4249 1 0 Feb12 ttyS2Ss+0:00 agetty ttyS2 115200 vt100 Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4271 4395 0 21:58 ?Ss 0:00 sshd: root@pts/12 Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4273 1 0 21:58 ?Rs 0:00 [bash] Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4395 1 0 Feb24 ?Ss 0:03 /usr/sbin/sshd Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4677 1 0 Feb12 ?Ss 0:00 /sbin/portreserve Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4690 1 0 Feb12 ?S 0:00 supervising syslog-ng Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4691 1 0 Feb12 ?Ss 0:46 syslog-ng -p /var/run/syslog-ng.pid Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpc 4746 1 0 Feb12 ?Ss 0:05 rpcbind Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpcuser 4764 1 0 Feb12 ?Ss 0:00 rpc.statd Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4797 1 0 Feb12 ?Ss 0:00 rpc.idmapd Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4803 12028 0 21:59 ?S 0:00 /bin/sh /usr/lib/ocf/resource.d/heartbeat/Filesystem stop while unmounting /home/cluster/virt directory.. what is quite curious, is, that last killed process seems to be Filesystem resource itself.. before I dig deeper into this, did anyone else noticed this problem? Is this some known (and possibly already issue)? thanks a lot in advance nik -- - Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28.rijna 168, 709 00 Ostrava tel.: +420 591 166 214 fax:+420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: ser...@linuxbox.cz - ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- .~. /V\ //
Re: [Pacemaker] Filesystem resource killing innocent processes on stop
18.05.2015 18:57, Nikola Ciprich wrote: Hi Vladislav, Isn't that a bind-mount? nope, but your question lead me to possible culprit.. it's cephfs mount, when I try to some local filesystem, I don't see this weird fuser behaviour.. so maybe fuser does not work correctly on cephfs? yep, for bind-mounts fuser shows/kills both processes which use bounded tree and original filesystem. this is how fs is mounted: 10.0.0.1,10.0.0.2,10.0.0.3:/ on /home/cluster/virt type ceph (name=admin,key=client.admin) I should probably ask in ceph maillist.. There are alternative ways to determine mountpoint usage btw. One of them is lsof, I use it for bind-mounts. n. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Filesystem resource killing innocent processes on stop
18.05.2015 13:20, Nikola Ciprich wrote: Hi, I noticed very annoying bug (or so I think), that resource-agents-3.9.5 in RHEL / centos 6 Filesystem OCF resource seems to be killing completely unrelated processes on shutdown although they're not using anything on mounted filesystem... Isn't that a bind-mount? unfortunately, one of processes very often killed is sshd :-( here's example of the log: Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 3606 1 0 Feb12 ?Ss0:01 /sbin/udevd -d Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4249 1 0 Feb12 ttyS2Ss+0:00 agetty ttyS2 115200 vt100 Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4271 4395 0 21:58 ?Ss 0:00 sshd: root@pts/12 Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4273 1 0 21:58 ?Rs 0:00 [bash] Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4395 1 0 Feb24 ?Ss 0:03 /usr/sbin/sshd Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4677 1 0 Feb12 ?Ss 0:00 /sbin/portreserve Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4690 1 0 Feb12 ?S 0:00 supervising syslog-ng Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4691 1 0 Feb12 ?Ss 0:46 syslog-ng -p /var/run/syslog-ng.pid Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpc 4746 1 0 Feb12 ?Ss 0:05 rpcbind Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpcuser 4764 1 0 Feb12 ?Ss 0:00 rpc.statd Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4797 1 0 Feb12 ?Ss 0:00 rpc.idmapd Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4803 12028 0 21:59 ?S 0:00 /bin/sh /usr/lib/ocf/resource.d/heartbeat/Filesystem stop while unmounting /home/cluster/virt directory.. what is quite curious, is, that last killed process seems to be Filesystem resource itself.. before I dig deeper into this, did anyone else noticed this problem? Is this some known (and possibly already issue)? thanks a lot in advance nik ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Filesystem resource killing innocent processes on stop
Hi Dejan, The list below seems too extensive. Which version of resource-agents do you run? $ grep 'Build version:' /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs yes, it's definitely wrong.. here's the info you've requested: # Build version: 5434e9646462d2c3c8f7aad2609d0ef1875839c7 rpm version: resource-agents-3.9.5-12.el6_6.5.x86_64 I can already see the problem, this version simply uses fuser -m $MOUNTPOINT which seems to return pretty wrong results: [root@denovav1b ~]# fuser -m /home/cluster/virt/ /home/cluster/virt/: 1m 3295m 3314m 4817m 4846m 4847m 4890m 4891m 4916m 4944m 4952m 4999m 5007m 5037m 5069m 5137m 5162m 5164m 5166m 5168m 5170m 5172m 5575m 8055m 9604m 9605m 10984m 11186m 11370m 11813m 11871m 11887m 11946m 12020m 12026m 12027m 12028m 12029m 12030m 12031m 14218m 15294m 15374m 15396m 15399m 17479m 17693m 17694m 20705m 20718m 20948m 20982m 23902m 24572m 24580m 26300m 29790m 29792m 30785m (notice even process # 1!) while lsof returns: lsof | grep cluster.*virt qemu-syst 8055 root 21r REG0,0 232783872 1099511634304 /home/cluster/virt/images/debian-7.8.0-amd64-netinst.iso which seems much saner to me.. BR nik here's example of the log: Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 3606 1 0 Feb12 ?Ss0:01 /sbin/udevd -d Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4249 1 0 Feb12 ttyS2Ss+0:00 agetty ttyS2 115200 vt100 Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4271 4395 0 21:58 ?Ss 0:00 sshd: root@pts/12 Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4273 1 0 21:58 ?Rs 0:00 [bash] Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4395 1 0 Feb24 ?Ss 0:03 /usr/sbin/sshd Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4677 1 0 Feb12 ?Ss 0:00 /sbin/portreserve Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4690 1 0 Feb12 ?S 0:00 supervising syslog-ng Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4691 1 0 Feb12 ?Ss 0:46 syslog-ng -p /var/run/syslog-ng.pid Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpc 4746 1 0 Feb12 ?Ss 0:05 rpcbind Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: rpcuser 4764 1 0 Feb12 ?Ss 0:00 rpc.statd Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4797 1 0 Feb12 ?Ss 0:00 rpc.idmapd Filesystem(virt-fs)[4803]: 2015/05/17_21:59:48 INFO: sending signal TERM to: root 4803 12028 0 21:59 ?S 0:00 /bin/sh /usr/lib/ocf/resource.d/heartbeat/Filesystem stop while unmounting /home/cluster/virt directory.. what is quite curious, is, that last killed process seems to be Filesystem resource itself.. Hmm, that's quite strange. That implies that the RA script itself had /home/cluster/virt as its WD. before I dig deeper into this, did anyone else noticed this problem? Is this some known (and possibly already issue)? Never heard of this. Thanks, Dejan thanks a lot in advance nik -- - Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28.rijna 168, 709 00 Ostrava tel.: +420 591 166 214 fax:+420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: ser...@linuxbox.cz - ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- - Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28.rijna 168, 709 00 Ostrava tel.: +420 591 166 214 fax:+420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: ser...@linuxbox.cz - pgpcec1t3m77F.pgp Description: PGP signature
Re: [Pacemaker] Filesystem resource killing innocent processes on stop
Hi Vladislav, Isn't that a bind-mount? nope, but your question lead me to possible culprit.. it's cephfs mount, when I try to some local filesystem, I don't see this weird fuser behaviour.. so maybe fuser does not work correctly on cephfs? this is how fs is mounted: 10.0.0.1,10.0.0.2,10.0.0.3:/ on /home/cluster/virt type ceph (name=admin,key=client.admin) I should probably ask in ceph maillist.. n. -- - Ing. Nikola CIPRICH LinuxBox.cz, s.r.o. 28.rijna 168, 709 00 Ostrava tel.: +420 591 166 214 fax:+420 596 621 273 mobil: +420 777 093 799 www.linuxbox.cz mobil servis: +420 737 238 656 email servis: ser...@linuxbox.cz - pgpVvNBo2wHdV.pgp Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org