Hello all, We run a Hadoop cluster of 60 datanodes (0.20.2, CDH3u4). Almost all jobs are scheduled via Oozie. Recently I’ve noticed that we get ~200 oozie.log entries per minute for actions of killed workflows:
[26/12/2013:16:00:55 PST] pool-2-thread-30 WARN org.apache.oozie.command.wf.SignalCommand: USER[oozie] GROUP[users] TOKEN[] APP[sv1_prod_client-20131001T1303-kec-W] JOB[0063764-131209015949959-oozie-tell-W] ACTION[0063764-131209015949959-oozie-tell-W@check-work-directory] Workflow not RUNNING, current status [KILLED] These workflow were in fact killed using ‘oozie job -kill' command. But these updates related actions continue forever. Each new oozie kill results in one or even two more zombie actions. Where are these zombies come from? How do we stop them? Thanks, mtjr
