Thanks Amit, I double checked that and did find a duplicated property for 'jobTracker' in my job.properties file that had a value that might have been problematic (localhost instead of the actual IP) but even after fixing that, I still see the same behavior.
The only thing in the job log, from the Oozie web-console, is this bit: 2014-04-21 21:34:38,504 INFO ActionStartXCommand:539 - USER[hadoop] GROUP[-] TOKEN[] APP[shell-wf] JOB[0000000-140421213147611-oozie-hado-W] ACTION[0000000-140421213147611-oozie-hado-W@:start:] Start action [0000000-140421213147611-oozie-hado-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2014-04-21 21:34:38,508 WARN ActionStartXCommand:542 - USER[hadoop] GROUP[-] TOKEN[] APP[shell-wf] JOB[0000000-140421213147611-oozie-hado-W] ACTION[0000000-140421213147611-oozie-hado-W@:start:] [***0000000-140421213147611-oozie-hado-W@:start:***]Action status=DONE 2014-04-21 21:34:38,509 WARN ActionStartXCommand:542 - USER[hadoop] GROUP[-] TOKEN[] APP[shell-wf] JOB[0000000-140421213147611-oozie-hado-W] ACTION[0000000-140421213147611-oozie-hado-W@:start:] [***0000000-140421213147611-oozie-hado-W@:start:***]Action updated in DB! 2014-04-21 21:34:39,142 INFO ActionStartXCommand:539 - USER[hadoop] GROUP[-] TOKEN[] APP[shell-wf] JOB[0000000-140421213147611-oozie-hado-W] ACTION[0000000-140421213147611-oozie-hado-W@shell-node] Start action [0000000-140421213147611-oozie-hado-W@shell-node] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10] 2014-04-21 21:34:40,334 WARN ShellActionExecutor:542 - USER[hadoop] GROUP[-] TOKEN[] APP[shell-wf] JOB[0000000-140421213147611-oozie-hado-W] ACTION[0000000-140421213147611-oozie-hado-W@shell-node] credentials is null for the action Nothing there jumps out to me, but I'm not an Oozie expert. Thanks, Phil On Mon, Apr 21, 2014 at 5:25 PM, Amit Patil <[email protected]>wrote: > Make sure that jobtracker and name node url:ports that you specified are > reachable from the oozie server. > > > On Mon, Apr 21, 2014 at 2:23 PM, Phillip Rhodes > <[email protected]>wrote: > > > Gang: > > > > More exploring reveals this: > > > > The Hive task hangs even when run as a straighforward workflow job, not > > using the coordinator stuff at all. But even more to the point, a basic > > workflow job using the "shell" task also hangs in the same way. The node > > for the "shell" task goes into PREP and nothing ever happens. > > > > Any ideas what might be causing this behavior? > > > > > > Thanks, > > > > > > Phil > > >
