Hi,

       My cluster was running great till yesterday. Today, I submitted some
jobs and I saw that the jobs were taking way too long. On investigation, I
saw that the "Child" processes created by previous MR jobs were not getting
killed, even though no jobs were running on the cluster, and there were like
40-50 child processes, each consuming memory, leading to huge swapping. When
I kill -9 the child processes and re-run the jobs, I don't encounter this
problem for some time, and then again the child processes don't get killed
and eventually swapping happens. What could be the reason/solution?

Thanks,
Hari

Reply via email to