Can you dig in more Hari? When a child process won't go down, try figuring what its doing? Thread-dump it or study its logs? St.Ack
On Tue, Dec 7, 2010 at 4:36 AM, Hari Sreekumar <[email protected]> wrote: > Hi, > > My cluster was running great till yesterday. Today, I submitted some > jobs and I saw that the jobs were taking way too long. On investigation, I > saw that the "Child" processes created by previous MR jobs were not getting > killed, even though no jobs were running on the cluster, and there were like > 40-50 child processes, each consuming memory, leading to huge swapping. When > I kill -9 the child processes and re-run the jobs, I don't encounter this > problem for some time, and then again the child processes don't get killed > and eventually swapping happens. What could be the reason/solution? > > Thanks, > Hari >
