Re: [Pacemaker] Filesystem resource killing innocent processes on stop

2015-05-19 Thread Dejan Muhamedagic
On Mon, May 18, 2015 at 07:34:38PM +0300, Vladislav Bogdanov wrote:
 18.05.2015 18:57, Nikola Ciprich wrote:
 Hi Vladislav,
 
 
 Isn't that a bind-mount?
 nope, but your question lead me to possible culprit..
 it's cephfs mount, when I try to some local filesystem, I don't
 see this weird fuser behaviour..
 
 so maybe fuser does not work correctly on cephfs?
 
 yep, for bind-mounts fuser shows/kills both processes which use
 bounded tree and original filesystem.
 
 
 this is how fs is mounted:
 
 10.0.0.1,10.0.0.2,10.0.0.3:/ on /home/cluster/virt type ceph 
 (name=admin,key=client.admin)
 
 I should probably ask in ceph maillist..
 
 There are alternative ways to determine mountpoint usage btw.
 One of them is lsof, I use it for bind-mounts.

The Filesystem RA supports bind mounts. Is there a problem then
with it using fuser?

Thanks,

Dejan


 
 n.
 
 
 
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Filesystem resource killing innocent processes on stop

2015-05-19 Thread Dejan Muhamedagic
On Mon, May 18, 2015 at 05:14:14PM +0200, Nikola Ciprich wrote:
 Hi Dejan,
 
  
  The list below seems too extensive.  Which version of
  resource-agents do you run?
  
  $ grep 'Build version:' /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs
 
 yes, it's definitely wrong..
 
 here's the info you've requested:
 
 # Build version: 5434e9646462d2c3c8f7aad2609d0ef1875839c7
 
 rpm version: resource-agents-3.9.5-12.el6_6.5.x86_64
 
 I can already see the problem, this version simply uses
 fuser -m $MOUNTPOINT which seems to return pretty wrong results:
 
 [root@denovav1b ~]# fuser -m /home/cluster/virt/
 /home/cluster/virt/: 1m  3295m  3314m  4817m  4846m  4847m  4890m  4891m  
 4916m  4944m  4952m  4999m  5007m  5037m  5069m  5137m  5162m  5164m  5166m  
 5168m  5170m  5172m  5575m  8055m  9604m  9605m 10984m 11186m 11370m 11813m 
 11871m 11887m 11946m 12020m 12026m 12027m 12028m 12029m 12030m 12031m 14218m 
 15294m 15374m 15396m 15399m 17479m 17693m 17694m 20705m 20718m 20948m 20982m 
 23902m 24572m 24580m 26300m 29790m 29792m 30785m
 
 (notice even process # 1!) while lsof returns:
 
 lsof | grep cluster.*virt
 qemu-syst  8055  root   21r  REG0,0  232783872 
 1099511634304 /home/cluster/virt/images/debian-7.8.0-amd64-netinst.iso
 
 which seems much saner to me..

Indeed. Is fuser broken or is there some kernel side confusion?
Did you also try:

lsof /home/cluster/virt/

Anyway, it would be good to bring this up with the centos people.

Thanks,

Dejan

 BR
 
 nik
 
 
  
   here's example of the log:
   
   Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
   TERM to: root  3606 1  0 Feb12 ?Ss0:01 /sbin/udevd -d
   Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
   TERM to: root  4249 1  0 Feb12 ttyS2Ss+0:00 agetty ttyS2 
   115200 vt100
   Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
   TERM to: root  4271  4395  0 21:58 ?Ss 0:00 sshd: 
   root@pts/12
   Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
   TERM to: root  4273 1  0 21:58 ?Rs 0:00 [bash]
   Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
   TERM to: root  4395 1  0 Feb24 ?Ss 0:03 /usr/sbin/sshd
   Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
   TERM to: root  4677 1  0 Feb12 ?Ss 0:00 
   /sbin/portreserve
   Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
   TERM to: root  4690 1  0 Feb12 ?S  0:00 supervising 
   syslog-ng
   Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
   TERM to: root  4691 1  0 Feb12 ?Ss 0:46 syslog-ng -p 
   /var/run/syslog-ng.pid
   Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
   TERM to: rpc   4746 1  0 Feb12 ?Ss 0:05 rpcbind
   Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
   TERM to: rpcuser   4764 1  0 Feb12 ?Ss 0:00 rpc.statd
   Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
   TERM to: root  4797 1  0 Feb12 ?Ss 0:00 rpc.idmapd
   Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
   TERM to: root  4803 12028  0 21:59 ?S  0:00 /bin/sh 
   /usr/lib/ocf/resource.d/heartbeat/Filesystem stop
   
   while unmounting /home/cluster/virt directory.. what is quite curious, 
   is, that last killed process seems to be
   Filesystem resource itself..
  
  Hmm, that's quite strange. That implies that the RA script itself
  had /home/cluster/virt as its WD.
  
   before I dig deeper into this, did anyone else noticed this problem? Is 
   this some known
   (and possibly already issue)?
  
  Never heard of this.
  
  Thanks,
  
  Dejan
  
   thanks a lot in advance
   
   nik
   
   
   -- 
   -
   Ing. Nikola CIPRICH
   LinuxBox.cz, s.r.o.
   28.rijna 168, 709 00 Ostrava
   
   tel.:   +420 591 166 214
   fax:+420 596 621 273
   mobil:  +420 777 093 799
   www.linuxbox.cz
   
   mobil servis: +420 737 238 656
   email servis: ser...@linuxbox.cz
   -
  
  
  
   ___
   Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
   
   Project Home: http://www.clusterlabs.org
   Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
   Bugs: http://bugs.clusterlabs.org
  
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
  
 

Re: [Pacemaker] Filesystem resource killing innocent processes on stop

2015-05-19 Thread Vladislav Bogdanov

19.05.2015 11:46, Dejan Muhamedagic wrote:

On Mon, May 18, 2015 at 07:34:38PM +0300, Vladislav Bogdanov wrote:

18.05.2015 18:57, Nikola Ciprich wrote:

Hi Vladislav,



Isn't that a bind-mount?

nope, but your question lead me to possible culprit..
it's cephfs mount, when I try to some local filesystem, I don't
see this weird fuser behaviour..

so maybe fuser does not work correctly on cephfs?


yep, for bind-mounts fuser shows/kills both processes which use
bounded tree and original filesystem.



this is how fs is mounted:

10.0.0.1,10.0.0.2,10.0.0.3:/ on /home/cluster/virt type ceph 
(name=admin,key=client.admin)

I should probably ask in ceph maillist..


There are alternative ways to determine mountpoint usage btw.
One of them is lsof, I use it for bind-mounts.


The Filesystem RA supports bind mounts. Is there a problem then
with it using fuser?


Definitely (but may be kernel/fuser version specific).



Thanks,

Dejan




n.








___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Filesystem resource killing innocent processes on stop

2015-05-19 Thread Vladislav Bogdanov

19.05.2015 11:44, Dejan Muhamedagic wrote:

On Mon, May 18, 2015 at 05:14:14PM +0200, Nikola Ciprich wrote:

Hi Dejan,



The list below seems too extensive.  Which version of
resource-agents do you run?

$ grep 'Build version:' /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs


yes, it's definitely wrong..

here's the info you've requested:

# Build version: 5434e9646462d2c3c8f7aad2609d0ef1875839c7

rpm version: resource-agents-3.9.5-12.el6_6.5.x86_64

I can already see the problem, this version simply uses
fuser -m $MOUNTPOINT which seems to return pretty wrong results:

[root@denovav1b ~]# fuser -m /home/cluster/virt/
/home/cluster/virt/: 1m  3295m  3314m  4817m  4846m  4847m  4890m  4891m  
4916m  4944m  4952m  4999m  5007m  5037m  5069m  5137m  5162m  5164m  5166m  
5168m  5170m  5172m  5575m  8055m  9604m  9605m 10984m 11186m 11370m 11813m 
11871m 11887m 11946m 12020m 12026m 12027m 12028m 12029m 12030m 12031m 14218m 
15294m 15374m 15396m 15399m 17479m 17693m 17694m 20705m 20718m 20948m 20982m 
23902m 24572m 24580m 26300m 29790m 29792m 30785m

(notice even process # 1!) while lsof returns:

lsof | grep cluster.*virt
qemu-syst  8055  root   21r  REG0,0  232783872 
1099511634304 /home/cluster/virt/images/debian-7.8.0-amd64-netinst.iso

which seems much saner to me..


Indeed. Is fuser broken or is there some kernel side confusion?


As far as was able to investigate, that comes from the fact that fuser 
uses device field which is the same for source and bind mount (yes, 
that is centos6).



Did you also try:

lsof /home/cluster/virt/

Anyway, it would be good to bring this up with the centos people.

Thanks,

Dejan


BR

nik





here's example of the log:

Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM to: 
root  3606 1  0 Feb12 ?Ss0:01 /sbin/udevd -d
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: root  4249 1  0 Feb12 ttyS2Ss+0:00 agetty ttyS2 115200 vt100
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: root  4271  4395  0 21:58 ?Ss 0:00 sshd: root@pts/12
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: root  4273 1  0 21:58 ?Rs 0:00 [bash]
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: root  4395 1  0 Feb24 ?Ss 0:03 /usr/sbin/sshd
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: root  4677 1  0 Feb12 ?Ss 0:00 /sbin/portreserve
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: root  4690 1  0 Feb12 ?S  0:00 supervising syslog-ng
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: root  4691 1  0 Feb12 ?Ss 0:46 syslog-ng -p 
/var/run/syslog-ng.pid
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: rpc   4746 1  0 Feb12 ?Ss 0:05 rpcbind
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: rpcuser   4764 1  0 Feb12 ?Ss 0:00 rpc.statd
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: root  4797 1  0 Feb12 ?Ss 0:00 rpc.idmapd
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: root  4803 12028  0 21:59 ?S  0:00 /bin/sh 
/usr/lib/ocf/resource.d/heartbeat/Filesystem stop

while unmounting /home/cluster/virt directory.. what is quite curious, is, that 
last killed process seems to be
Filesystem resource itself..


Hmm, that's quite strange. That implies that the RA script itself
had /home/cluster/virt as its WD.


before I dig deeper into this, did anyone else noticed this problem? Is this 
some known
(and possibly already issue)?


Never heard of this.

Thanks,

Dejan


thanks a lot in advance

nik


--
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-





___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



--
-
Ing. Nikola CIPRICH

Re: [Pacemaker] Filesystem resource killing innocent processes on stop

2015-05-19 Thread Angie T. Muhammad
Hi Nikola,

I wish I could help, but I am not using Pacemaker for 3 years now, sorry. I
just wanted to thank you for the E-mail subject, it drew a big smile on my
face after a long tiresome not-so-good day. Really, thank you :)

Best regards,
Angie Tawfik
Am 18.05.2015 13:31 schrieb Nikola Ciprich nikola.cipr...@linuxbox.cz:

 Hi,

 I noticed very annoying bug (or so I think), that resource-agents-3.9.5
 in RHEL / centos 6 Filesystem OCF resource seems to be killing completely
 unrelated processes on shutdown although they're not using anything on
 mounted filesystem...

 unfortunately, one of processes very often killed is sshd :-(

 here's example of the log:

 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal
 TERM to: root  3606 1  0 Feb12 ?Ss0:01 /sbin/udevd -d
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal
 TERM to: root  4249 1  0 Feb12 ttyS2Ss+0:00 agetty ttyS2
 115200 vt100
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal
 TERM to: root  4271  4395  0 21:58 ?Ss 0:00 sshd: root@pts
 /12
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal
 TERM to: root  4273 1  0 21:58 ?Rs 0:00 [bash]
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal
 TERM to: root  4395 1  0 Feb24 ?Ss 0:03 /usr/sbin/sshd
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal
 TERM to: root  4677 1  0 Feb12 ?Ss 0:00
 /sbin/portreserve
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal
 TERM to: root  4690 1  0 Feb12 ?S  0:00 supervising
 syslog-ng
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal
 TERM to: root  4691 1  0 Feb12 ?Ss 0:46 syslog-ng -p
 /var/run/syslog-ng.pid
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal
 TERM to: rpc   4746 1  0 Feb12 ?Ss 0:05 rpcbind
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal
 TERM to: rpcuser   4764 1  0 Feb12 ?Ss 0:00 rpc.statd
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal
 TERM to: root  4797 1  0 Feb12 ?Ss 0:00 rpc.idmapd
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal
 TERM to: root  4803 12028  0 21:59 ?S  0:00 /bin/sh
 /usr/lib/ocf/resource.d/heartbeat/Filesystem stop

 while unmounting /home/cluster/virt directory.. what is quite curious, is,
 that last killed process seems to be
 Filesystem resource itself..

 before I dig deeper into this, did anyone else noticed this problem? Is
 this some known
 (and possibly already issue)?

 thanks a lot in advance

 nik


 --
 -
 Ing. Nikola CIPRICH
 LinuxBox.cz, s.r.o.
 28.rijna 168, 709 00 Ostrava

 tel.:   +420 591 166 214
 fax:+420 596 621 273
 mobil:  +420 777 093 799
 www.linuxbox.cz

 mobil servis: +420 737 238 656
 email servis: ser...@linuxbox.cz
 -

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Filesystem resource killing innocent processes on stop

2015-05-18 Thread Dejan Muhamedagic
Hi,

On Mon, May 18, 2015 at 12:20:38PM +0200, Nikola Ciprich wrote:
 Hi,
 
 I noticed very annoying bug (or so I think), that resource-agents-3.9.5
 in RHEL / centos 6 Filesystem OCF resource seems to be killing completely
 unrelated processes on shutdown although they're not using anything on 
 mounted filesystem...
 
 unfortunately, one of processes very often killed is sshd :-(

The list below seems too extensive.  Which version of
resource-agents do you run?

$ grep 'Build version:' /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs

 here's example of the log:
 
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: root  3606 1  0 Feb12 ?Ss0:01 /sbin/udevd -d
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: root  4249 1  0 Feb12 ttyS2Ss+0:00 agetty ttyS2 115200 
 vt100
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: root  4271  4395  0 21:58 ?Ss 0:00 sshd: root@pts/12
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: root  4273 1  0 21:58 ?Rs 0:00 [bash]
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: root  4395 1  0 Feb24 ?Ss 0:03 /usr/sbin/sshd
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: root  4677 1  0 Feb12 ?Ss 0:00 /sbin/portreserve
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: root  4690 1  0 Feb12 ?S  0:00 supervising syslog-ng
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: root  4691 1  0 Feb12 ?Ss 0:46 syslog-ng -p 
 /var/run/syslog-ng.pid
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: rpc   4746 1  0 Feb12 ?Ss 0:05 rpcbind
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: rpcuser   4764 1  0 Feb12 ?Ss 0:00 rpc.statd
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: root  4797 1  0 Feb12 ?Ss 0:00 rpc.idmapd
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: root  4803 12028  0 21:59 ?S  0:00 /bin/sh 
 /usr/lib/ocf/resource.d/heartbeat/Filesystem stop
 
 while unmounting /home/cluster/virt directory.. what is quite curious, is, 
 that last killed process seems to be
 Filesystem resource itself..

Hmm, that's quite strange. That implies that the RA script itself
had /home/cluster/virt as its WD.

 before I dig deeper into this, did anyone else noticed this problem? Is this 
 some known
 (and possibly already issue)?

Never heard of this.

Thanks,

Dejan

 thanks a lot in advance
 
 nik
 
 
 -- 
 -
 Ing. Nikola CIPRICH
 LinuxBox.cz, s.r.o.
 28.rijna 168, 709 00 Ostrava
 
 tel.:   +420 591 166 214
 fax:+420 596 621 273
 mobil:  +420 777 093 799
 www.linuxbox.cz
 
 mobil servis: +420 737 238 656
 email servis: ser...@linuxbox.cz
 -



 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Filesystem resource killing innocent processes on stop

2015-05-18 Thread emmanuel segura
are you sure you process are not in working directory /home/cluster/virt ?

I'm using suse 11 Sp2 and I don't know if the agent is the same in
redhat 6, but i think so, anyway for umounting the fs the script uses
the following functions Filesystem_stop - fs_stop - signal_processes

In the fs_stop function, the cluster try to kill the process that
using the fs with TERM signal

fs_stop() {
local SUB=$1 timeout=$2 sig cnt
for sig in TERM KILL; do
cnt=$((timeout/2)) # try half time with TERM
while [ $cnt -gt 0 ]; do
try_umount $SUB 
return $OCF_SUCCESS
ocf_log err Couldn't unmount $SUB; trying
cleanup with $sig
signal_processes $SUB $sig
cnt=$((cnt-1))
sleep 1
done
done
return $OCF_ERR_GENERIC
}

In function signal_processes, the cluster uses fuser to kill the process

signal_processes() {
local dir=$1
local sig=$2
# fuser returns a non-zero return code if none of the
# specified files is accessed or in case of a fatal
# error.
if [ X${HOSTOS} = XOpenBSD ];then
PIDS=`fstat | grep $dir | awk '{print $3}'`
for PID in ${PIDS};do
kill -s $sig ${PID}
ocf_log info Sent signal $sig to ${PID}
done
else
if $FUSER -$sig -m -k $dir ; then
ocf_log info Some processes on $dir were signalled
else
ocf_log info No processes on $dir were signalled
fi
fi
}

2015-05-18 12:20 GMT+02:00 Nikola Ciprich nikola.cipr...@linuxbox.cz:
 Hi,

 I noticed very annoying bug (or so I think), that resource-agents-3.9.5
 in RHEL / centos 6 Filesystem OCF resource seems to be killing completely
 unrelated processes on shutdown although they're not using anything on 
 mounted filesystem...

 unfortunately, one of processes very often killed is sshd :-(

 here's example of the log:

 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: root  3606 1  0 Feb12 ?Ss0:01 /sbin/udevd -d
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: root  4249 1  0 Feb12 ttyS2Ss+0:00 agetty ttyS2 115200 
 vt100
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: root  4271  4395  0 21:58 ?Ss 0:00 sshd: root@pts/12
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: root  4273 1  0 21:58 ?Rs 0:00 [bash]
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: root  4395 1  0 Feb24 ?Ss 0:03 /usr/sbin/sshd
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: root  4677 1  0 Feb12 ?Ss 0:00 /sbin/portreserve
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: root  4690 1  0 Feb12 ?S  0:00 supervising syslog-ng
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: root  4691 1  0 Feb12 ?Ss 0:46 syslog-ng -p 
 /var/run/syslog-ng.pid
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: rpc   4746 1  0 Feb12 ?Ss 0:05 rpcbind
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: rpcuser   4764 1  0 Feb12 ?Ss 0:00 rpc.statd
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: root  4797 1  0 Feb12 ?Ss 0:00 rpc.idmapd
 Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
 to: root  4803 12028  0 21:59 ?S  0:00 /bin/sh 
 /usr/lib/ocf/resource.d/heartbeat/Filesystem stop

 while unmounting /home/cluster/virt directory.. what is quite curious, is, 
 that last killed process seems to be
 Filesystem resource itself..

 before I dig deeper into this, did anyone else noticed this problem? Is this 
 some known
 (and possibly already issue)?

 thanks a lot in advance

 nik


 --
 -
 Ing. Nikola CIPRICH
 LinuxBox.cz, s.r.o.
 28.rijna 168, 709 00 Ostrava

 tel.:   +420 591 166 214
 fax:+420 596 621 273
 mobil:  +420 777 093 799
 www.linuxbox.cz

 mobil servis: +420 737 238 656
 email servis: ser...@linuxbox.cz
 -

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




-- 
  .~.
  /V\
 //  

Re: [Pacemaker] Filesystem resource killing innocent processes on stop

2015-05-18 Thread Vladislav Bogdanov

18.05.2015 18:57, Nikola Ciprich wrote:

Hi Vladislav,



Isn't that a bind-mount?

nope, but your question lead me to possible culprit..
it's cephfs mount, when I try to some local filesystem, I don't
see this weird fuser behaviour..

so maybe fuser does not work correctly on cephfs?


yep, for bind-mounts fuser shows/kills both processes which use bounded 
tree and original filesystem.




this is how fs is mounted:

10.0.0.1,10.0.0.2,10.0.0.3:/ on /home/cluster/virt type ceph 
(name=admin,key=client.admin)

I should probably ask in ceph maillist..


There are alternative ways to determine mountpoint usage btw.
One of them is lsof, I use it for bind-mounts.



n.








___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Filesystem resource killing innocent processes on stop

2015-05-18 Thread Vladislav Bogdanov

18.05.2015 13:20, Nikola Ciprich wrote:

Hi,

I noticed very annoying bug (or so I think), that resource-agents-3.9.5
in RHEL / centos 6 Filesystem OCF resource seems to be killing completely
unrelated processes on shutdown although they're not using anything on mounted 
filesystem...


Isn't that a bind-mount?



unfortunately, one of processes very often killed is sshd :-(

here's example of the log:

Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM to: 
root  3606 1  0 Feb12 ?Ss0:01 /sbin/udevd -d
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: root  4249 1  0 Feb12 ttyS2Ss+0:00 agetty ttyS2 115200 vt100
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: root  4271  4395  0 21:58 ?Ss 0:00 sshd: root@pts/12
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: root  4273 1  0 21:58 ?Rs 0:00 [bash]
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: root  4395 1  0 Feb24 ?Ss 0:03 /usr/sbin/sshd
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: root  4677 1  0 Feb12 ?Ss 0:00 /sbin/portreserve
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: root  4690 1  0 Feb12 ?S  0:00 supervising syslog-ng
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: root  4691 1  0 Feb12 ?Ss 0:46 syslog-ng -p 
/var/run/syslog-ng.pid
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: rpc   4746 1  0 Feb12 ?Ss 0:05 rpcbind
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: rpcuser   4764 1  0 Feb12 ?Ss 0:00 rpc.statd
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: root  4797 1  0 Feb12 ?Ss 0:00 rpc.idmapd
Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal TERM 
to: root  4803 12028  0 21:59 ?S  0:00 /bin/sh 
/usr/lib/ocf/resource.d/heartbeat/Filesystem stop

while unmounting /home/cluster/virt directory.. what is quite curious, is, that 
last killed process seems to be
Filesystem resource itself..

before I dig deeper into this, did anyone else noticed this problem? Is this 
some known
(and possibly already issue)?

thanks a lot in advance

nik




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Filesystem resource killing innocent processes on stop

2015-05-18 Thread Nikola Ciprich
Hi Dejan,

 
 The list below seems too extensive.  Which version of
 resource-agents do you run?
 
 $ grep 'Build version:' /usr/lib/ocf/lib/heartbeat/ocf-shellfuncs

yes, it's definitely wrong..

here's the info you've requested:

# Build version: 5434e9646462d2c3c8f7aad2609d0ef1875839c7

rpm version: resource-agents-3.9.5-12.el6_6.5.x86_64

I can already see the problem, this version simply uses
fuser -m $MOUNTPOINT which seems to return pretty wrong results:

[root@denovav1b ~]# fuser -m /home/cluster/virt/
/home/cluster/virt/: 1m  3295m  3314m  4817m  4846m  4847m  4890m  4891m  
4916m  4944m  4952m  4999m  5007m  5037m  5069m  5137m  5162m  5164m  5166m  
5168m  5170m  5172m  5575m  8055m  9604m  9605m 10984m 11186m 11370m 11813m 
11871m 11887m 11946m 12020m 12026m 12027m 12028m 12029m 12030m 12031m 14218m 
15294m 15374m 15396m 15399m 17479m 17693m 17694m 20705m 20718m 20948m 20982m 
23902m 24572m 24580m 26300m 29790m 29792m 30785m

(notice even process # 1!) while lsof returns:

lsof | grep cluster.*virt
qemu-syst  8055  root   21r  REG0,0  232783872 
1099511634304 /home/cluster/virt/images/debian-7.8.0-amd64-netinst.iso

which seems much saner to me..

BR

nik


 
  here's example of the log:
  
  Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
  TERM to: root  3606 1  0 Feb12 ?Ss0:01 /sbin/udevd -d
  Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
  TERM to: root  4249 1  0 Feb12 ttyS2Ss+0:00 agetty ttyS2 
  115200 vt100
  Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
  TERM to: root  4271  4395  0 21:58 ?Ss 0:00 sshd: 
  root@pts/12
  Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
  TERM to: root  4273 1  0 21:58 ?Rs 0:00 [bash]
  Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
  TERM to: root  4395 1  0 Feb24 ?Ss 0:03 /usr/sbin/sshd
  Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
  TERM to: root  4677 1  0 Feb12 ?Ss 0:00 
  /sbin/portreserve
  Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
  TERM to: root  4690 1  0 Feb12 ?S  0:00 supervising 
  syslog-ng
  Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
  TERM to: root  4691 1  0 Feb12 ?Ss 0:46 syslog-ng -p 
  /var/run/syslog-ng.pid
  Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
  TERM to: rpc   4746 1  0 Feb12 ?Ss 0:05 rpcbind
  Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
  TERM to: rpcuser   4764 1  0 Feb12 ?Ss 0:00 rpc.statd
  Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
  TERM to: root  4797 1  0 Feb12 ?Ss 0:00 rpc.idmapd
  Filesystem(virt-fs)[4803]:  2015/05/17_21:59:48 INFO: sending signal 
  TERM to: root  4803 12028  0 21:59 ?S  0:00 /bin/sh 
  /usr/lib/ocf/resource.d/heartbeat/Filesystem stop
  
  while unmounting /home/cluster/virt directory.. what is quite curious, is, 
  that last killed process seems to be
  Filesystem resource itself..
 
 Hmm, that's quite strange. That implies that the RA script itself
 had /home/cluster/virt as its WD.
 
  before I dig deeper into this, did anyone else noticed this problem? Is 
  this some known
  (and possibly already issue)?
 
 Never heard of this.
 
 Thanks,
 
 Dejan
 
  thanks a lot in advance
  
  nik
  
  
  -- 
  -
  Ing. Nikola CIPRICH
  LinuxBox.cz, s.r.o.
  28.rijna 168, 709 00 Ostrava
  
  tel.:   +420 591 166 214
  fax:+420 596 621 273
  mobil:  +420 777 093 799
  www.linuxbox.cz
  
  mobil servis: +420 737 238 656
  email servis: ser...@linuxbox.cz
  -
 
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpcec1t3m77F.pgp
Description: PGP signature

Re: [Pacemaker] Filesystem resource killing innocent processes on stop

2015-05-18 Thread Nikola Ciprich
Hi Vladislav,

 
 Isn't that a bind-mount?
nope, but your question lead me to possible culprit..
it's cephfs mount, when I try to some local filesystem, I don't
see this weird fuser behaviour..

so maybe fuser does not work correctly on cephfs?

this is how fs is mounted:

10.0.0.1,10.0.0.2,10.0.0.3:/ on /home/cluster/virt type ceph 
(name=admin,key=client.admin)

I should probably ask in ceph maillist..

n.


 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-


pgpVvNBo2wHdV.pgp
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org