Thanks, Ram!

I did notice this in the change log of keepalived 1.2.20 
http://www.keepalived.org/changelog.html

"Optimise closure of fds before invoking scripts. Every time before a script 
was invoked, closeall() was called, which would spin through 1024 file 
descriptors closing them, even though the vast majority were not open, 
resulting in 1024 system calls. To avoid that, open all sockets and file 
descriptors (except fd 0/1/2) with the CLOEXEC flag set, so that the fds will 
be closed by the kernel when the script is exec'd."


----- Original Message -----From: Ram Ranganathan <[email protected]>To: 
Chuck Sochin <[email protected]>Cc: users 
<[email protected]>Sent: Wed, 06 Apr 2016 05:04:50 -0000 
(UTC)Subject: Re: Openshift HA environment - keepalived high number of close 
syscalls

Haven't seen this till you mentioned it. I can see the close calls in my local 
env. It looks like it happens in a new process - after a clone() syscall at 
about a couple of seconds apart. So it is likely part of the script that does 
the health check: 

     script "</dev/tcp/${ip}/${watch_port}"

But I don't see a slowdown on the cpu side on my instance - its running about 
1% for the last 30 mins odd  so suspect that might have to do with the 
agent/sysdig in your case. 

Filing a bug would be good - spent some time right now but couldn't figure out 
what's causing it or if its a "feature". 
 

Thanks,

Ram//


On Tue, Apr 5, 2016 at 2:04 PM, Chuck Sochin <[email protected]> wrote:
Using OSEv3.1.1

I'm looking to setup sysdig in our native HA openshift environment, but having 
issues getting the agent to run on our infra nodes hosting keepalived and 
ha-proxy -- agent runs without issue on all the other nodes in our env.

After the agent has been running about an hour or two, the node hangs and our 
hypervisor reports 100% cpu utilization. A power reset is the only option to 
bring the node back to life. The problem may be with keepalived doing an 
extremely large number(around 17 million in a minute) of "close" syscall 
operations, and it looks like those close operations are on any available fd. 
Is this expected behavior of keepalived running in an OSEv3.1.1 HA environment?

Thanks!







_______________________________________________

users mailing 
[email protected]http://lists.openshift.redhat.com/openshiftmm/listinfo/users



-- 

Ram//main(O,s){s=--O;10<putchar(3^O?97-(15&7183>>4*s)*(O++?-1:1):10)&&\main(++O,s++);}



_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Reply via email to