Hey, We have a oozie workflow that triggers multiple MR actions. The workflow failed last week when one of the MR actions failed with -
MR002: Unknown hadoop job [job_201301292116_65447] associated with action [0000161-130130163537664-oozie-oozi-W@topic_mr_1016227728]. Failing this action! The corresponding MR job did exist and it had run successfully. That was the first time we had seen this error and we are still trying to reproduce it. Moreover there is nothing in the logs (oozie, hadoop) to indicate the root-cause. I found this link when researching about this error but I dont know what the resolution steps are - http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201203.mbox/%3c92308400.32028.1332178177590.javamail.tom...@hel.zones.apache.org%3E One thing that we have observed in the past that might help resolve this issue was that JobClient.getJob(job id) was returning null for some MR jobs. We still see this randomly but havent resolved this yet. We are using oozie 2.3.2-cdh3u3 & Hadoop 0.20.2-cdh3u3. Appreciate any help... Thanks, Siva
