Hi,
I think this is exactly what we encountered. Good timing too on the ticket. 
Thanks for the heads-up!

________________________________________
From: Robert Kanter [[email protected]]
Sent: June 20, 2014 1:39 PM
To: [email protected]
Subject: Re: Invalid execution path on job rerun

Hi Denis,

This sounds like it could be OOZIE-1879
<https://issues.apache.org/jira/browse/OOZIE-1879>, which I recently
committed a patch for.  In it, the issue was a workflow where an action
after a fork failed, and on rerun, you'd get an "invalid execution path".
 It wasn't easy to track down, but it turns out that during a rerun, Oozie
goes through all of the actions and for a fork, it goes in the order they
are listed in the fork action XML.  But if the forked actions finished in a
different order during the original run, then you'd get this error.  A
workaround would be to list the actions in the fork in the order that
they're likely to complete, but that's probably not really practical.
 Otherwise, you'll need OOZIE-1879 to fix this.

- Robert


On Fri, Jun 20, 2014 at 8:05 AM, Denis Yuen <[email protected]> wrote:

> Hi,
>
> We're running into an issue where workflows that fail and have to be
> re-run (with oozie.wf.rerun.failnodes=true ) immediately fail again with a
> message in the Oozie log "invalid execution path."
>
> The consistent pattern that we observe is that in a workflow with a fork
> (fork_2) leading to a join (join_2) which leads to a fork (fork_3), if a
> failure occurs in the jobs that fork_3 leads to, then on retry, the failure
> will immediately occur. If there is no failure, the workflow executes to
> completion normally.  What we've also observed is that if fork_2 leads to a
> number of jobs (bash_cp_3, bash_cp_4, bash_cp6, bash_cp_8, bash_cp_10,
> bash_cp_12), then the apparently invalid execution paths are any of the
> first five. In other words, if any of the first five are seemingly randomly
> set by Oozie in Oozie's wf_actions table for the execution_path for join_2,
> the re-run will fail. Only if "bash_cp_12" is set then the workflow will
> successfully re-run.
>
> Another thing that might be relevant is that we are using a custom action
> executor that submits to SGE (for legacy reasons). The code is available at
> https://github.com/SeqWare/oozie-sge/tree/1.0.2 This is with Oozie
> version 3.3.2-cdh4.5.0
>
> Are there any thoughts on whether there is some API call that we're
> failing to make in our custom action executor that affects execution path?
> Are we structuring our workflows in some unexpected manner?
> What is the meaning of an execution path for a control node such as join
> anyways?
>
> Thanks for any insight!
>
> Large amounts of text follow ....
>
> Relevant error in log:
> 2014-06-05 14:06:01,599 DEBUG SignalXCommand:545 - USER[dyuen] GROUP[-]
> TOKEN[] APP[HelloWorld] JOB[0000000-140605140030484-oozie-oozi-W]
> ACTION[0000000-140605140030484-oozie-oozi-W@join_2] STARTED SignalCommand
> for jobid=0000000-140605140030484-oozie-oozi-W,
> actionId=0000000-140605140030484-oozie-oozi-W@join_2
> 2014-06-05 14:06:01,600 DEBUG LiteWorkflowInstance:545 - USER[dyuen]
> GROUP[-] TOKEN[] APP[HelloWorld] JOB[0000000-140605140030484-oozie-oozi-W]
> ACTION[0000000-140605140030484-oozie-oozi-W@join_2] Signaling job
> execution path [/bash_cp_3/] signal value [OK]
> 2014-06-05 14:06:01,600 ERROR LiteWorkflowInstance:536 - USER[dyuen]
> GROUP[-] TOKEN[] APP[HelloWorld] JOB[0000000-140605140030484-oozie-oozi-W]
> ACTION[0000000-140605140030484-oozie-oozi-W@join_2] invalid execution
> path [/bash_cp_3/]
> 2014-06-05 14:06:01,601  WARN LiteWorkflowInstance:542 - USER[dyuen]
> GROUP[-] TOKEN[] APP[HelloWorld] JOB[0000000-140605140030484-oozie-oozi-W]
> ACTION[0000000-140605140030484-oozie-oozi-W@join_2] Workflow completed
> [FAILED], failing [0] running nodes
> Oozie wf_actions table for the relevant workflow:
>
>                                id                               |
>   name            | signal_value | status |        transition         |
> execution_path
>
> ----------------------------------------------------------------+---------------------------+--------------+--------+---------------------------+------------------------
>  0000000-140605140030484-oozie-oozi-W@:start:                   | :start:
>                   | OK           | OK     | start_0                   | /
>  0000000-140605140030484-oozie-oozi-W@start_0                   | start_0
>                   | OK           | OK     | provisionFile_file_in_0_1 | /
>  0000000-140605140030484-oozie-oozi-W@provisionFile_file_in_0_1 |
> provisionFile_file_in_0_1 | OK           | OK     | bash_mkdir_2
>    | /
>  0000000-140605140030484-oozie-oozi-W@bash_mkdir_2              |
> bash_mkdir_2              | OK           | OK     | fork_2
>    | /
>  0000000-140605140030484-oozie-oozi-W@fork_2                    | fork_2
>                    | OK           | OK     | *                         | /
>  0000000-140605140030484-oozie-oozi-W@bash_cp_3                 |
> bash_cp_3                 | OK           | OK     | join_2
>    | /bash_cp_3/
>  0000000-140605140030484-oozie-oozi-W@bash_cp_4                 |
> bash_cp_4                 | OK           | OK     | join_2
>    | /bash_cp_4/
>  0000000-140605140030484-oozie-oozi-W@bash_cp_6                 |
> bash_cp_6                 | OK           | OK     | join_2
>    | /bash_cp_6/
>  0000000-140605140030484-oozie-oozi-W@bash_cp_8                 |
> bash_cp_8                 | OK           | OK     | join_2
>    | /bash_cp_8/
>  0000000-140605140030484-oozie-oozi-W@bash_cp_10                |
> bash_cp_10                | OK           | OK     | join_2
>    | /bash_cp_10/
>  0000000-140605140030484-oozie-oozi-W@bash_cp_12                |
> bash_cp_12                | OK           | OK     | join_2
>    | /bash_cp_12/
>  0000000-140605140030484-oozie-oozi-W@join_2                    | join_2
>                    | OK           | OK     | fork_3                    |
> /bash_cp_3/
>  0000000-140605140030484-oozie-oozi-W@fork_3                    | fork_3
>                    | OK           | OK     | *                         | /
>  0000000-140605140030484-oozie-oozi-W@provisionFile_out_5       |
> provisionFile_out_5       | OK           | OK     | join_3
>    | /provisionFile_out_5/
>  0000000-140605140030484-oozie-oozi-W@provisionFile_out_7       |
> provisionFile_out_7       | OK           | OK     | join_3
>    | /provisionFile_out_7/
>  0000000-140605140030484-oozie-oozi-W@provisionFile_out_9       |
> provisionFile_out_9       | OK           | OK     | join_3
>    | /provisionFile_out_9/
>  0000000-140605140030484-oozie-oozi-W@provisionFile_out_11      |
> provisionFile_out_11      | OK           | OK     | join_3
>    | /provisionFile_out_11/
>  0000000-140605140030484-oozie-oozi-W@provisionFile_out_13      |
> provisionFile_out_13      | OK           | OK     | join_3
>    | /provisionFile_out_13/
>  0000000-140605140030484-oozie-oozi-W@fail                      | fail
>                    | OK           | OK     |                           |
> /bash_cp_14/
> (19 rows)
>
>
> The workflow:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <workflow-app xmlns="uri:oozie:workflow:0.4" name="HelloWorld">
>   <start to="start_0" />
>   <action name="start_0" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/start_0-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/start_0-qsub.opts</options-file>
>     </sge>
>     <ok to="provisionFile_file_in_0_1" />
>     <error to="fail" />
>   </action>
>   <action name="provisionFile_file_in_0_1" retry-max="5"
> retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_file_in_0_1-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_file_in_0_1-qsub.opts</options-file>
>     </sge>
>     <ok to="bash_mkdir_2" />
>     <error to="fail" />
>   </action>
>   <action name="bash_mkdir_2" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_mkdir_2-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_mkdir_2-qsub.opts</options-file>
>     </sge>
>     <ok to="fork_2" />
>     <error to="fail" />
>   </action>
>   <fork name="fork_2">
>     <path start="bash_cp_3" />
>     <path start="bash_cp_4" />
>     <path start="bash_cp_6" />
>     <path start="bash_cp_8" />
>     <path start="bash_cp_10" />
>     <path start="bash_cp_12" />
>   </fork>
>   <action name="bash_cp_3" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_3-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_3-qsub.opts</options-file>
>     </sge>
>     <ok to="join_2" />
>     <error to="fail" />
>   </action>
>   <action name="bash_cp_4" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_4-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_4-qsub.opts</options-file>
>     </sge>
>     <ok to="join_2" />
>     <error to="fail" />
>   </action>
>   <action name="bash_cp_6" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_6-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_6-qsub.opts</options-file>
>     </sge>
>     <ok to="join_2" />
>     <error to="fail" />
>   </action>
>   <action name="bash_cp_8" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_8-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_8-qsub.opts</options-file>
>     </sge>
>     <ok to="join_2" />
>     <error to="fail" />
>   </action>
>   <action name="bash_cp_10" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_10-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_10-qsub.opts</options-file>
>     </sge>
>     <ok to="join_2" />
>     <error to="fail" />
>   </action>
>   <action name="bash_cp_12" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_12-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_12-qsub.opts</options-file>
>     </sge>
>     <ok to="join_2" />
>     <error to="fail" />
>   </action>
>   <join name="join_2" to="fork_3" />
>   <fork name="fork_3">
>     <path start="bash_cp_14" />
>     <path start="provisionFile_out_5" />
>     <path start="provisionFile_out_7" />
>     <path start="provisionFile_out_9" />
>     <path start="provisionFile_out_11" />
>     <path start="provisionFile_out_13" />
>   </fork>
>   <action name="bash_cp_14" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_14-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_14-qsub.opts</options-file>
>     </sge>
>     <ok to="join_3" />
>     <error to="fail" />
>   </action>
>   <action name="provisionFile_out_5" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_5-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_5-qsub.opts</options-file>
>     </sge>
>     <ok to="join_3" />
>     <error to="fail" />
>   </action>
>   <action name="provisionFile_out_7" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_7-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_7-qsub.opts</options-file>
>     </sge>
>     <ok to="join_3" />
>     <error to="fail" />
>   </action>
>   <action name="provisionFile_out_9" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_9-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_9-qsub.opts</options-file>
>     </sge>
>     <ok to="join_3" />
>     <error to="fail" />
>   </action>
>   <action name="provisionFile_out_11" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_11-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_11-qsub.opts</options-file>
>     </sge>
>     <ok to="join_3" />
>     <error to="fail" />
>   </action>
>   <action name="provisionFile_out_13" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_13-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_13-qsub.opts</options-file>
>     </sge>
>     <ok to="join_3" />
>     <error to="fail" />
>   </action>
>   <join name="join_3" to="fork_4" />
>   <fork name="fork_4">
>     <path start="bash_cp_15" />
>     <path start="bash_cp_17" />
>     <path start="bash_cp_19" />
>     <path start="bash_cp_21" />
>     <path start="bash_cp_23" />
>   </fork>
>   <action name="bash_cp_15" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_15-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_15-qsub.opts</options-file>
>     </sge>
>     <ok to="join_4" />
>     <error to="fail" />
>   </action>
>   <action name="bash_cp_17" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_17-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_17-qsub.opts</options-file>
>     </sge>
>     <ok to="join_4" />
>     <error to="fail" />
>   </action>
>   <action name="bash_cp_19" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_19-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_19-qsub.opts</options-file>
>     </sge>
>     <ok to="join_4" />
>     <error to="fail" />
>   </action>
>   <action name="bash_cp_21" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_21-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_21-qsub.opts</options-file>
>     </sge>
>     <ok to="join_4" />
>     <error to="fail" />
>   </action>
>   <action name="bash_cp_23" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_23-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_23-qsub.opts</options-file>
>     </sge>
>     <ok to="join_4" />
>     <error to="fail" />
>   </action>
>   <join name="join_4" to="fork_5" />
>   <fork name="fork_5">
>     <path start="provisionFile_out_16" />
>     <path start="provisionFile_out_18" />
>     <path start="provisionFile_out_20" />
>     <path start="provisionFile_out_22" />
>     <path start="provisionFile_out_24" />
>   </fork>
>   <action name="provisionFile_out_16" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_16-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_16-qsub.opts</options-file>
>     </sge>
>     <ok to="join_5" />
>     <error to="fail" />
>   </action>
>   <action name="provisionFile_out_18" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_18-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_18-qsub.opts</options-file>
>     </sge>
>     <ok to="join_5" />
>     <error to="fail" />
>   </action>
>   <action name="provisionFile_out_20" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_20-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_20-qsub.opts</options-file>
>     </sge>
>     <ok to="join_5" />
>     <error to="fail" />
>   </action>
>   <action name="provisionFile_out_22" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_22-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_22-qsub.opts</options-file>
>     </sge>
>     <ok to="join_5" />
>     <error to="fail" />
>   </action>
>   <action name="provisionFile_out_24" retry-max="5" retry-interval="5">
>     <sge xmlns="uri:oozie:sge-action:1.0">
>
> <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_24-runner.sh</script>
>
> <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_24-qsub.opts</options-file>
>     </sge>
>     <ok to="join_5" />
>     <error to="fail" />
>   </action>
>   <join name="join_5" to="done" />
>   <join name="join_274314800376896" to="done" />
>   <action name="done">
>     <fs>
>       <delete
> path="hdfs://localhost:8020/user/dyuen/seqware_workflow/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b"
> />
>     </fs>
>     <ok to="end" />
>     <error to="fail" />
>   </action>
>   <kill name="fail">
>     <message>Java failed, error
> message[${wf:errorMessage(wf:lastErrorNode())}]</message>
>   </kill>
>   <end name="end" />
> </workflow-app>
>
>

Reply via email to