Fixed this; it was a matter of updating /etc/dhcp/dhclient.conf with "supercede 
domainname real.domain.com" to override the buggy one in resolve.conf.  Not a 
hadoop/oozie issue at all, just a newbie Ubuntu problem.

Now it's working and I get the task finished callbacks without any issues.  
Thanks!
________________________________________
From: Jess Sheneberger
Sent: Wednesday, October 17, 2012 2:31 PM
To: [email protected]; Mohammad Islam
Subject: RE: callbacks not happening, short jobs in RUNNING state for 10 min

Thanks Mohammad, I found the exception in the tracker log, and it was failing 
to resolve the domain name because I had removed it from the hosts file trying 
to troubleshoot another issue.

So this may be way out of scope for you and/or this list, but where do 
Oozie/Hadoop get the full domain name of the local machine?  My /etc/hostname 
specifies only the short name and my network's DNS is wrong--is there a place I 
can override this?
________________________________________
From: Mohammad Islam [[email protected]]
Sent: Wednesday, October 17, 2012 12:28 PM
To: [email protected]
Subject: Re: callbacks not happening, short jobs in RUNNING state for 10 min

Hi Jess,
Your analysis is correct.

If you never received any Callback for job id 
0000014-121016184312009-oozie-oozi-W,most possibly, hadoop has some issue. What 
version of hadoop are you using? Is it secured hadoop?

Can you please check the job tracker log for that time frame around. Some 
relevant messages might be there. One bad callback could slow down the all 
callbacks from JT. You might even receive your callback after few hours due to 
late delivery from JT. JT currently using a single thread for dispatching all 
the callbacks.

Regards,
Mohammad


________________________________
From: Jess Sheneberger <[email protected]>
To: "[email protected]" <[email protected]>
Sent: Wednesday, October 17, 2012 11:03 AM
Subject: callbacks not happening, short jobs in RUNNING state for 10 min

Hi,

I'm trying out Oozie for the first time, and when I first started running the 
examples they'd complete fairly quickly, and now they're taking a long time 
(10+ minutes) to complete.  I think the callbacks aren't working, because in 
the first few runs I can see a job log entry for CallbackServlet, but I don't 
see this on my most recent job runs.

It looks like the action (shell, java, etc) from the example that reads 
arguments and writes back to stdout is running quickly, and then Oozie leaves 
the job in RUNNING state for 10 minutes until it polls it.  Any idea what could 
be messing up the callback?

In the first few runs I saw this in the job log, just a few seconds after the 
job transistioned to RUNNING:

2012-10-16 19:35:45,063  INFO CallbackServlet:539 - USER[-] GROUP[-] TOKEN[-] 
APP[-] JOB[0000002-121016184312009-oozie-oozi-W] 
ACTION[0000002-121016184312009-oozie-oozi-W@shell1] callback for action 
[0000002-121016184312009-oozie-oozi-W@shell1]

Now, after the job transistions to RUNNING, I see this about 10 minutes later:

2012-10-17 11:52:29,640  INFO JavaActionExecutor:539 - USER[jess] GROUP[-] 
TOKEN[] APP[java-main-wf] JOB[0000014-121016184312009-oozie-oozi-W] 
ACTION[0000014-121016184312009-oozie-oozi-W@java-node] action completed, 
external ID [job_201210161828_0013]

Which must be the poller kicking in and realizing the task has completed, but 
why isn't the callback happening?  How can I troubleshoot this?

Thanks
Jess

Reply via email to