Thanks for the replies.  I am seeing the following exception in the logs
during callback.


2014-06-28 15:14:46,048  INFO CallbackServlet:539 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[0000000-140628150326547-oozie-hado-W]
ACTION[0000000-140628150326547-oozie-hado-W@person-action] callback for
action [0000000-140628150326547-oozie-hado-W@person-action]
2014-06-28 15:14:46,062 ERROR CompletedActionXCommand:536 - USER[-]
GROUP[-] TOKEN[] APP[-] JOB[0000000-140628150326547-oozie-hado-W]
ACTION[0000000-140628150326547-oozie-hado-W@person-action] XException,
org.apache.oozie.command.CommandException: E0800: Action it is not running
its in [PREP] state, action
[0000000-140628150326547-oozie-hado-W@person-action]
        at
org.apache.oozie.command.wf.CompletedActionXCommand.eagerVerifyPrecondition(CompletedActionXCommand.java:85)
        at org.apache.oozie.command.XCommand.call(XCommand.java:251)
        at
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
        at java.lang.Thread.run(Thread.java:662)
2014-06-28 15:14:46,062  WARN CallableQueueService$CallableWrapper:542 -
USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] exception callable
[callback], E0800: Action it is not running its in [PREP] state, action
[0000000-140628150326547-oozie-hado-W@person-action]
org.apache.oozie.command.CommandException: E0800: Action it is not running
its in [PREP] state, action
[0000000-140628150326547-oozie-hado-W@person-action]
        at
org.apache.oozie.command.wf.CompletedActionXCommand.eagerVerifyPrecondition(CompletedActionXCommand.java:85)
        at org.apache.oozie.command.XCommand.call(XCommand.java:251)
        at
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
        at java.lang.Thread.run(Thread.java:662)
2014-06-28 15:14:46,076  WARN ActionStartXCommand:542 - USER[hadoop]
GROUP[-] TOKEN[] APP[main-job-wf] JOB[0000000-140628150326547-oozie-hado-W]
ACTION[0000000-140628150326547-oozie-hado-W@person-action]
[***0000000-140628150326547-oozie-hado-W@person-action***]Action
status=RUNNING
2014-06-28 15:14:46,076  WARN ActionStartXCommand:542 - USER[hadoop]
GROUP[-] TOKEN[] APP[main-job-wf] JOB[0000000-140628150326547-oozie-hado-W]
ACTION[0000000-140628150326547-oozie-hado-W@person-action]
[***0000000-140628150326547-oozie-hado-W@person-action***]Action updated in
DB!
2014-06-28 15:14:46,095  INFO ActionStartXCommand:539 - USER[hadoop]
GROUP[-] TOKEN[] APP[person-enrichment-job-wf]
JOB[0000004-140628150326547-oozie-hado-W]
ACTION[0000004-140628150326547-oozie-hado-W@:start:] Start action
[0000004-140628150326547-oozie-hado-W@:start:] with user-retry state :
userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2014-06-28 15:14:46,095  WARN ActionStartXCommand:542 - USER[hadoop]
GROUP[-] TOKEN[] APP[person-enrichment-job-wf]
JOB[0000004-140628150326547-oozie-hado-W]
ACTION[0000004-140628150326547-oozie-hado-W@:start:]
[***0000004-140628150326547-oozie-hado-W@:start:***]Action status=DONE
2014-06-28 15:14:46,095  WARN ActionStartXCommand:542 - USER[hadoop]
GROUP[-] TOKEN[] APP[person-enrichment-job-wf]
JOB[0000004-140628150326547-oozie-hado-W]
ACTION[0000004-140628150326547-oozie-hado-W@:start:]
[***0000004-140628150326547-oozie-hado-W@:start:***]Action updated in DB!



However the pause doesn't seem to happen in the callback.  It seems to have
something to do with the actions getting suspended as shown in the
following log entry.



2014-06-28 15:14:51,436  INFO JavaActionExecutor:539 - USER[hadoop]
GROUP[-] TOKEN[] APP[person-enrichment-job-wf]
JOB[0000004-140628150326547-oozie-hado-W]
ACTION[0000004-140628150326547-oozie-hado-W@personTelephoneEnrichmentJob]
checking action, external ID [job_201406260453_0554] status [RUNNING]
2014-06-28 15:14:51,439  WARN ActionStartXCommand:542 - USER[hadoop]
GROUP[-] TOKEN[] APP[person-enrichment-job-wf]
JOB[0000004-140628150326547-oozie-hado-W]
ACTION[0000004-140628150326547-oozie-hado-W@personTelephoneEnrichmentJob]
[***0000004-140628150326547-oozie-hado-W@personTelephoneEnrichmentJob***]Action
status=RUNNING
2014-06-28 15:14:51,440  WARN ActionStartXCommand:542 - USER[hadoop]
GROUP[-] TOKEN[] APP[person-enrichment-job-wf]
JOB[0000004-140628150326547-oozie-hado-W]
ACTION[0000004-140628150326547-oozie-hado-W@personTelephoneEnrichmentJob]
[***0000004-140628150326547-oozie-hado-W@personTelephoneEnrichmentJob***]Action
updated in DB!
2014-06-28 15:15:42,976  WARN ActionCheckXCommand:542 - USER[hadoop]
GROUP[-] TOKEN[] APP[person-enrichment-job-wf]
JOB[0000011-140515062648271-oozie-hado-W]
ACTION[0000011-140515062648271-oozie-hado-W@personEnrichmentJob] E0818:
Action [0000011-140515062648271-oozie-hado-W@personEnrichmentJob] status is
running but WF Job [0000011-140515062648271-oozie-hado-W] status is
[SUSPENDED]. Expected status is RUNNING., Error Code: E0818


Looking at the Hadoop jobtracker UI these jobs completed successfully
at Sat Jun 28 15:15:52 UTC 2014 which seems to be right around the
suspended messages.  These actions stay suspended for another ten minutes
before kicking off the next set of actions.


2014-06-28 15:25:43,603  INFO JavaActionExecutor:539 - USER[hadoop]
GROUP[-] TOKEN[] APP[person-enrichment-job-wf]
JOB[0000004-140628150326547-oozie-hado-W]
ACTION[0000004-140628150326547-oozie-hado-W@personEmailEnrichmentJob]
action completed, external ID [job_201406260453_0553]
2014-06-28 15:25:43,607  INFO JavaActionExecutor:539 - USER[hadoop]
GROUP[-] TOKEN[] APP[person-enrichment-job-wf]
JOB[0000004-140628150326547-oozie-hado-W]
ACTION[0000004-140628150326547-oozie-hado-W@personEmailEnrichmentJob]
action produced output



Is this still related to the callback issue you mention or is this
something else with the suspended actions?  FWIW I am able to connect to
the Oozie servers from all the nodes in the Hadoop cluster.

Thanks for the help!

Jonathan





On Fri, Jun 27, 2014 at 11:29 AM, Amit Patil <[email protected]>
wrote:

> I think the callback url is not reachable by all the nodes on your hadoop
> cluster. Since, the callback url fails, there is no way for the workers to
> notify the oozie server that the action has finished execution. After 10
> mins, the oozie server polls the worker node to get the status. Thats why
> you are seeing 10 mins pauses.
> Set the following variable in  oozie-env.sh to the ipaddres of the server
> instead of the hostname and restart the oozie server
> export OOZIE_HTTP_HOSTNAME=<server ip>
>
>
>
>
> On Fri, Jun 27, 2014 at 10:19 AM, Jonathan Hodges <[email protected]>
> wrote:
>
> > Hi,
> >
> > We have noticed some pauses between execution of action steps in our
> Oozie
> > workflows.  A job may take a couple minutes yet the next step isn't
> > triggered for sometimes ten minutes.  Is there a way to control this
> pause?
> >  We have a set of jobs that when executed at a command-line runs for
> about
> > 45 mins.  The same set of jobs in Oozie is over 3 hours.
> >
> > We tried setting the following properties, but we aren't using
> coordinators
> > so it didn't help.
> >
> > oozie.service.CoordMaterializeTriggerService.lookup.interval=30
> > oozie.service.CoordMaterializeTriggerService.materialization.window=7200
> >
> > Is there maybe similar properties at the workflow level we can set?
> >
> > Thanks so much,
> >
> > Jonathan
> >
>

Reply via email to