Is there a way I can determine the exact callback URL to verify
connectivity from each of the Hadoop nodes?  I guess maybe the actions are
getting suspended until callback URL is called or manual check every ten
minutes as mentioned.



On Sat, Jun 28, 2014 at 9:50 AM, Jonathan Hodges <[email protected]> wrote:

> Thanks for the replies.  I am seeing the following exception in the logs
> during callback.
>
>
> 2014-06-28 15:14:46,048  INFO CallbackServlet:539 - USER[-] GROUP[-]
> TOKEN[-] APP[-] JOB[0000000-140628150326547-oozie-hado-W]
> ACTION[0000000-140628150326547-oozie-hado-W@person-action] callback for
> action [0000000-140628150326547-oozie-hado-W@person-action]
> 2014-06-28 15:14:46,062 ERROR CompletedActionXCommand:536 - USER[-]
> GROUP[-] TOKEN[] APP[-] JOB[0000000-140628150326547-oozie-hado-W]
> ACTION[0000000-140628150326547-oozie-hado-W@person-action] XException,
> org.apache.oozie.command.CommandException: E0800: Action it is not running
> its in [PREP] state, action
> [0000000-140628150326547-oozie-hado-W@person-action]
>         at
> org.apache.oozie.command.wf.CompletedActionXCommand.eagerVerifyPrecondition(CompletedActionXCommand.java:85)
>         at org.apache.oozie.command.XCommand.call(XCommand.java:251)
>         at
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>         at java.lang.Thread.run(Thread.java:662)
> 2014-06-28 15:14:46,062  WARN CallableQueueService$CallableWrapper:542 -
> USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] exception callable
> [callback], E0800: Action it is not running its in [PREP] state, action
> [0000000-140628150326547-oozie-hado-W@person-action]
> org.apache.oozie.command.CommandException: E0800: Action it is not running
> its in [PREP] state, action
> [0000000-140628150326547-oozie-hado-W@person-action]
>         at
> org.apache.oozie.command.wf.CompletedActionXCommand.eagerVerifyPrecondition(CompletedActionXCommand.java:85)
>         at org.apache.oozie.command.XCommand.call(XCommand.java:251)
>         at
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>         at java.lang.Thread.run(Thread.java:662)
> 2014-06-28 15:14:46,076  WARN ActionStartXCommand:542 - USER[hadoop]
> GROUP[-] TOKEN[] APP[main-job-wf] JOB[0000000-140628150326547-oozie-hado-W]
> ACTION[0000000-140628150326547-oozie-hado-W@person-action]
> [***0000000-140628150326547-oozie-hado-W@person-action***]Action
> status=RUNNING
> 2014-06-28 15:14:46,076  WARN ActionStartXCommand:542 - USER[hadoop]
> GROUP[-] TOKEN[] APP[main-job-wf] JOB[0000000-140628150326547-oozie-hado-W]
> ACTION[0000000-140628150326547-oozie-hado-W@person-action]
> [***0000000-140628150326547-oozie-hado-W@person-action***]Action updated
> in DB!
> 2014-06-28 15:14:46,095  INFO ActionStartXCommand:539 - USER[hadoop]
> GROUP[-] TOKEN[] APP[person-enrichment-job-wf]
> JOB[0000004-140628150326547-oozie-hado-W]
> ACTION[0000004-140628150326547-oozie-hado-W@:start:] Start action
> [0000004-140628150326547-oozie-hado-W@:start:] with user-retry state :
> userRetryCount [0], userRetryMax [0], userRetryInterval [10]
> 2014-06-28 15:14:46,095  WARN ActionStartXCommand:542 - USER[hadoop]
> GROUP[-] TOKEN[] APP[person-enrichment-job-wf]
> JOB[0000004-140628150326547-oozie-hado-W]
> ACTION[0000004-140628150326547-oozie-hado-W@:start:]
> [***0000004-140628150326547-oozie-hado-W@:start:***]Action status=DONE
> 2014-06-28 15:14:46,095  WARN ActionStartXCommand:542 - USER[hadoop]
> GROUP[-] TOKEN[] APP[person-enrichment-job-wf]
> JOB[0000004-140628150326547-oozie-hado-W]
> ACTION[0000004-140628150326547-oozie-hado-W@:start:]
> [***0000004-140628150326547-oozie-hado-W@:start:***]Action updated in DB!
>
>
>
> However the pause doesn't seem to happen in the callback.  It seems to
> have something to do with the actions getting suspended as shown in the
> following log entry.
>
>
>
> 2014-06-28 15:14:51,436  INFO JavaActionExecutor:539 - USER[hadoop]
> GROUP[-] TOKEN[] APP[person-enrichment-job-wf]
> JOB[0000004-140628150326547-oozie-hado-W]
> ACTION[0000004-140628150326547-oozie-hado-W@personTelephoneEnrichmentJob]
> checking action, external ID [job_201406260453_0554] status [RUNNING]
> 2014-06-28 15:14:51,439  WARN ActionStartXCommand:542 - USER[hadoop]
> GROUP[-] TOKEN[] APP[person-enrichment-job-wf]
> JOB[0000004-140628150326547-oozie-hado-W]
> ACTION[0000004-140628150326547-oozie-hado-W@personTelephoneEnrichmentJob]
> [***0000004-140628150326547-oozie-hado-W@personTelephoneEnrichmentJob***]Action
> status=RUNNING
> 2014-06-28 15:14:51,440  WARN ActionStartXCommand:542 - USER[hadoop]
> GROUP[-] TOKEN[] APP[person-enrichment-job-wf]
> JOB[0000004-140628150326547-oozie-hado-W]
> ACTION[0000004-140628150326547-oozie-hado-W@personTelephoneEnrichmentJob]
> [***0000004-140628150326547-oozie-hado-W@personTelephoneEnrichmentJob***]Action
> updated in DB!
> 2014-06-28 15:15:42,976  WARN ActionCheckXCommand:542 - USER[hadoop]
> GROUP[-] TOKEN[] APP[person-enrichment-job-wf]
> JOB[0000011-140515062648271-oozie-hado-W]
> ACTION[0000011-140515062648271-oozie-hado-W@personEnrichmentJob] E0818:
> Action [0000011-140515062648271-oozie-hado-W@personEnrichmentJob] status
> is running but WF Job [0000011-140515062648271-oozie-hado-W] status is
> [SUSPENDED]. Expected status is RUNNING., Error Code: E0818
>
>
> Looking at the Hadoop jobtracker UI these jobs completed successfully
> at Sat Jun 28 15:15:52 UTC 2014 which seems to be right around the
> suspended messages.  These actions stay suspended for another ten minutes
> before kicking off the next set of actions.
>
>
> 2014-06-28 15:25:43,603  INFO JavaActionExecutor:539 - USER[hadoop]
> GROUP[-] TOKEN[] APP[person-enrichment-job-wf]
> JOB[0000004-140628150326547-oozie-hado-W]
> ACTION[0000004-140628150326547-oozie-hado-W@personEmailEnrichmentJob]
> action completed, external ID [job_201406260453_0553]
> 2014-06-28 15:25:43,607  INFO JavaActionExecutor:539 - USER[hadoop]
> GROUP[-] TOKEN[] APP[person-enrichment-job-wf]
> JOB[0000004-140628150326547-oozie-hado-W]
> ACTION[0000004-140628150326547-oozie-hado-W@personEmailEnrichmentJob]
> action produced output
>
>
>
> Is this still related to the callback issue you mention or is this
> something else with the suspended actions?  FWIW I am able to connect to
> the Oozie servers from all the nodes in the Hadoop cluster.
>
> Thanks for the help!
>
> Jonathan
>
>
>
>
>
> On Fri, Jun 27, 2014 at 11:29 AM, Amit Patil <[email protected]>
> wrote:
>
>> I think the callback url is not reachable by all the nodes on your hadoop
>> cluster. Since, the callback url fails, there is no way for the workers to
>> notify the oozie server that the action has finished execution. After 10
>> mins, the oozie server polls the worker node to get the status. Thats why
>> you are seeing 10 mins pauses.
>> Set the following variable in  oozie-env.sh to the ipaddres of the server
>> instead of the hostname and restart the oozie server
>> export OOZIE_HTTP_HOSTNAME=<server ip>
>>
>>
>>
>>
>> On Fri, Jun 27, 2014 at 10:19 AM, Jonathan Hodges <[email protected]>
>> wrote:
>>
>> > Hi,
>> >
>> > We have noticed some pauses between execution of action steps in our
>> Oozie
>> > workflows.  A job may take a couple minutes yet the next step isn't
>> > triggered for sometimes ten minutes.  Is there a way to control this
>> pause?
>> >  We have a set of jobs that when executed at a command-line runs for
>> about
>> > 45 mins.  The same set of jobs in Oozie is over 3 hours.
>> >
>> > We tried setting the following properties, but we aren't using
>> coordinators
>> > so it didn't help.
>> >
>> > oozie.service.CoordMaterializeTriggerService.lookup.interval=30
>> > oozie.service.CoordMaterializeTriggerService.materialization.window=7200
>> >
>> > Is there maybe similar properties at the workflow level we can set?
>> >
>> > Thanks so much,
>> >
>> > Jonathan
>> >
>>
>
>

Reply via email to