On 03/02/17 16:08 -0500, Scott Greenlese wrote: > Over the past few days, I noticed that pcsd and ruby process is pegged at > 99% CPU, and commands such as pcs status pcsd take up to 5 minutes to > complete. > On all active cluster nodes, top shows: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > COMMAND > 27225 haclust+ 20 0 116324 91600 23136 R 99.3 > 0.1 1943:40 cib > 23277 root 20 0 12.868g 8.176g 8460 S 99.7 > 13.0 407:44.18 ruby > > [...] > > I would appreciate some guidance how to proceed with debugging this issue. > I have not taken any recovery actions yet. > I considered stopping the cluster, recycling pcsd.service on all nodes, > restarting cluster... and also, reboot the nodes, if > necessary. But, didn't want to clear it yet in case there's anything I can > capture while in this state.
If you still have the pcsd/ruby process in that state, it might be worth dumping a core for further off-line examination. Assuming you have enough space to store it (in order of gigabytes, it seems) and gdb installed, you can do it like: gcore -o pcsd.core 23277 I have no idea how far the support for Ruby interpretation in gdb goes (Python is quite well supported in terms of high level debugging), but could be enough for figuring out what's going on. If you are confident enough your cluster configuration does not contain anything too confidential, it would perhaps be best if you shared this core file in a compressed form privately with tojeline at redhat. Otherwise, you can use gdb itself to look around the call stack in the core file, strings utility to guess if there's excessive accumulation of particular strings, and similar analyses, some of which are applicable also on live process, and some would be usable only on live process (like strace). Hope this helps at least a bit. -- Jan (Poki)
pgpJWDsRh8Rq6.pgp
Description: PGP signature
_______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org