Re: [ClusterLabs] pcsd processes using 100% CPU

Casey & Gina Wed, 23 May 2018 11:43:24 -0700

Okay, I have this happening again on a couple servers right now, and am happy 
to let it spin and dig more into it.  I'm not at all experienced with stuff 
like this though, so will need some explicit instruction on what to do beyond 
what I've documented here...


I don't see anything of note in the pcsd.log - seems to just be normal activity 
being logged by the master process that isn't runaway.  Here's a snippet:

10.124.167.177 - - [23/May/2018:15:56:34 +0000] "GET /remote/get_configs 
HTTP/1.1" 200 553 0.0145
10.124.167.177 - - [23/May/2018:15:56:34 +0000] "GET /remote/get_configs 
HTTP/1.1" 200 553 0.0147
10.124.167.177 - - [23/May/2018:15:56:34 UTC] "GET /remote/get_configs 
HTTP/1.1" 200 553
- -> /remote/get_configs
I, [2018-05-23T15:56:37.972682 #1378]  INFO -- : Running: 
/usr/sbin/corosync-cmapctl totem.cluster_name
I, [2018-05-23T15:56:37.972805 #1378]  INFO -- : CIB USER: hacluster, groups: 
I, [2018-05-23T15:56:37.982066 #1378]  INFO -- : Return Value: 0
10.124.167.176 - - [23/May/2018:15:56:37 +0000] "GET /remote/get_configs 
HTTP/1.1" 200 553 0.0107
10.124.167.176 - - [23/May/2018:15:56:37 +0000] "GET /remote/get_configs 
HTTP/1.1" 200 553 0.0108
10.124.167.176 - - [23/May/2018:15:56:37 UTC] "GET /remote/get_configs 
HTTP/1.1" 200 553
- -> /remote/get_configs
I, [2018-05-23T15:57:10.648134 #1378]  INFO -- : Running: 
/usr/sbin/corosync-cmapctl totem.cluster_name
I, [2018-05-23T15:57:10.648276 #1378]  INFO -- : CIB USER: hacluster, groups: 
I, [2018-05-23T15:57:10.660617 #1378]  INFO -- : Return Value: 0
10.124.167.178 - - [23/May/2018:15:57:10 +0000] "GET /remote/get_configs 
HTTP/1.1" 200 553 0.0140
10.124.167.178 - - [23/May/2018:15:57:10 +0000] "GET /remote/get_configs 
HTTP/1.1" 200 553 0.0141
10.124.167.178 - - [23/May/2018:15:57:10 UTC] "GET /remote/get_configs 
HTTP/1.1" 200 553
- -> /remote/get_configs

I ran `strace -p <pid>`, and the screen filled with the following line 
repeating as fast as my terminal can render:
sched_yield()                           = 0
sched_yield()                           = 0
sched_yield()                           = 0

I redirected this into a file for about 1 second and it filled with about 
20,000 of those lines.

I installed ltrace, but didn't really know how to use it...

`ltrace -p <pid>` didn't output anything.

`ltrace -p <pid> -S` showed something similar to strace:

SYS_sched_yield(0x7f0ebc3f5c40, 0x7f0ebc3f5c40, 0, 0x7273752f3a6e6962)          
                                             = 0
SYS_sched_yield(0x7f0ebc3f5c40, 0x7f0ebc3f5c40, 0, 0x7273752f3a6e6962)          
                                             = 0
SYS_sched_yield(0x7f0ebc3f5c40, 0x7f0ebc3f5c40, 0, 0x7273752f3a6e6962)          
                                             = 0

I next enabled debugging in /etc/default/pcsd and issued a `systemctl restart 
pcsd`.  Unfortunately, that killed the runaway child process.

However, I found another server where it's also happening again.  Debugging is 
not enabled there, but is there anything else I can do while the process is 
still running?

Here are the pcsd processes:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      6103  0.0  0.3 1076744 59972 ?       Ssl  Apr06  67:17 /usr/bin/ruby 
-C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > /dev/null &
root     24923 99.8  0.3 1076744 52744 ?       Rl   May19 5556:31  \_ 
/usr/bin/ruby -C/var/lib/pcsd -I/usr/share/pcsd -- /usr/share/pcsd/ssl.rb & > 
/dev/null &

I don't have gcore installed and don't know which package might provide it.  I 
also don't have experience with gdb but am happy to try anything suggested to 
help figure out what's going on.

The pcs version is 0.9.149, as packaged by Debian and inherited by Ubuntu.

Regards,
-- 
Casey
_______________________________________________
Users mailing list: [email protected]
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] pcsd processes using 100% CPU

Reply via email to