On 09/11/15 13:11 +0000, Karthikeyan Ramasamy wrote: > root 13405 1 0 13:42 ? 00:00:00 /usr/sbin/crm_mon -p > /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E > /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h > /tmp/ClusterMon_SNMP_10.64.109.36.html > root 13566 13405 0 13:42 ? 00:00:00 /usr/sbin/crm_mon -p > /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E > /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h > /tmp/ClusterMon_SNMP_10.64.109.36.html > root 13623 13566 0 13:42 ? 00:00:00 /usr/sbin/crm_mon -p > /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E > /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h > /tmp/ClusterMon_SNMP_10.64.109.36.html > root 13758 13566 0 13:42 ? 00:00:00 /usr/sbin/crm_mon -p > /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E > /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h > /tmp/ClusterMon_SNMP_10.64.109.36.html > root 13784 13623 0 13:42 ? 00:00:00 /usr/sbin/crm_mon -p > /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E > /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h > /tmp/ClusterMon_SNMP_10.64.109.36.html > root 14146 13566 0 13:42 ? 00:00:00 /usr/sbin/crm_mon -p > /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E > /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h > /tmp/ClusterMon_SNMP_10.64.109.36.html > root 14167 13623 0 13:42 ? 00:00:00 /usr/sbin/crm_mon -p > /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E > /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h > /tmp/ClusterMon_SNMP_10.64.109.36.html > root 14193 13784 0 13:42 ? 00:00:00 /usr/sbin/crm_mon -p > /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E > /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h > /tmp/ClusterMon_SNMP_10.64.109.36.html > root 14284 13758 0 13:42 ? 00:00:00 /usr/sbin/crm_mon -p > /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E > /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h > /tmp/ClusterMon_SNMP_10.64.109.36.html > root 14381 13784 0 13:42 ? 00:00:00 /usr/sbin/crm_mon -p > /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E > /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h > /tmp/ClusterMon_SNMP_10.64.109.36.html > root 14469 14284 0 13:42 ? 00:00:00 /usr/sbin/crm_mon -p > /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E > /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h > /tmp/ClusterMon_SNMP_10.64.109.36.html > root 14589 13405 0 13:42 ? 00:00:00 /usr/sbin/crm_mon -p > /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E > /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h > /tmp/ClusterMon_SNMP_10.64.109.36.html > root 14837 14381 0 13:42 ? 00:00:00 /usr/sbin/crm_mon -p > /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E > /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h > /tmp/ClusterMon_SNMP_10.64.109.36.html > root 14860 13566 0 13:42 ? 00:00:00 /usr/sbin/crm_mon -p > /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E > /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h > /tmp/ClusterMon_SNMP_10.64.109.36.html > root 14977 14589 0 13:42 ? 00:00:00 /usr/sbin/crm_mon -p > /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E > /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h > /tmp/ClusterMon_SNMP_10.64.109.36.html > root 19816 14167 0 13:43 ? 00:00:00 /usr/sbin/crm_mon -p > /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E > /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h > /tmp/ClusterMon_SNMP_10.64.109.36.html > root 19845 19816 0 13:43 ? 00:00:00 /usr/sbin/crm_mon -p > /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E > /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h > /tmp/ClusterMon_SNMP_10.64.109.36.html > > From the above it looks that one crm_mon spawns another crm_mon processes and > keeps building.
Yep, see the attached PID scheme. My guess is that the script PCSESA.sh is in fact an accidental "soft" fork bomb that could be reduced to something like this t.sh script: echo -e '#!/bin/sh\nwhile true; do sleep 15; (eval "$0" "$@" &); done' > t.sh chmod +x t.sh ./t.sh --foo bar What puzzles me, though, is that the same PID file used in nested execution is not preventing this sort of recursion, and I am wondering if "open(..., | O_SYNC)" or explicit fsync after write would be of any help here (smells like filesystem-level race condition). > Can you please let us know if there is anything else we have to > check or still there could be issues with the script? Karthik, would you be able to provide somewhat reduced version of PCSESA.sh (as requested by Ken) that still reproduces the issue? -- Jan (Poki)
pgpONXRDcGnul.pgp
Description: PGP signature
_______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org