Hi Jess,
Your analysis is correct.

If you never received any Callback for job 
id 0000014-121016184312009-oozie-oozi-W,most possibly, hadoop has some issue. 
What version of hadoop are you using? Is it secured hadoop?

Can you please check the job tracker log for that time frame around. Some 
relevant messages might be there. One bad callback could slow down the all 
callbacks from JT. You might even receive your callback after few hours due to 
late delivery from JT. JT currently using a single thread for dispatching all 
the callbacks.

Regards,
Mohammad


________________________________
From: Jess Sheneberger <[email protected]>
To: "[email protected]" <[email protected]> 
Sent: Wednesday, October 17, 2012 11:03 AM
Subject: callbacks not happening, short jobs in RUNNING state for 10 min

Hi,

I'm trying out Oozie for the first time, and when I first started running the 
examples they'd complete fairly quickly, and now they're taking a long time 
(10+ minutes) to complete.  I think the callbacks aren't working, because in 
the first few runs I can see a job log entry for CallbackServlet, but I don't 
see this on my most recent job runs.

It looks like the action (shell, java, etc) from the example that reads 
arguments and writes back to stdout is running quickly, and then Oozie leaves 
the job in RUNNING state for 10 minutes until it polls it.  Any idea what could 
be messing up the callback?

In the first few runs I saw this in the job log, just a few seconds after the 
job transistioned to RUNNING:

2012-10-16 19:35:45,063  INFO CallbackServlet:539 - USER[-] GROUP[-] TOKEN[-] 
APP[-] JOB[0000002-121016184312009-oozie-oozi-W] 
ACTION[0000002-121016184312009-oozie-oozi-W@shell1] callback for action 
[0000002-121016184312009-oozie-oozi-W@shell1]

Now, after the job transistions to RUNNING, I see this about 10 minutes later:

2012-10-17 11:52:29,640  INFO JavaActionExecutor:539 - USER[jess] GROUP[-] 
TOKEN[] APP[java-main-wf] JOB[0000014-121016184312009-oozie-oozi-W] 
ACTION[0000014-121016184312009-oozie-oozi-W@java-node] action completed, 
external ID [job_201210161828_0013]

Which must be the poller kicking in and realizing the task has completed, but 
why isn't the callback happening?  How can I troubleshoot this?

Thanks
Jess

Reply via email to