Hi Denis, This sounds like it could be OOZIE-1879 <https://issues.apache.org/jira/browse/OOZIE-1879>, which I recently committed a patch for. In it, the issue was a workflow where an action after a fork failed, and on rerun, you'd get an "invalid execution path". It wasn't easy to track down, but it turns out that during a rerun, Oozie goes through all of the actions and for a fork, it goes in the order they are listed in the fork action XML. But if the forked actions finished in a different order during the original run, then you'd get this error. A workaround would be to list the actions in the fork in the order that they're likely to complete, but that's probably not really practical. Otherwise, you'll need OOZIE-1879 to fix this.
- Robert On Fri, Jun 20, 2014 at 8:05 AM, Denis Yuen <[email protected]> wrote: > Hi, > > We're running into an issue where workflows that fail and have to be > re-run (with oozie.wf.rerun.failnodes=true ) immediately fail again with a > message in the Oozie log "invalid execution path." > > The consistent pattern that we observe is that in a workflow with a fork > (fork_2) leading to a join (join_2) which leads to a fork (fork_3), if a > failure occurs in the jobs that fork_3 leads to, then on retry, the failure > will immediately occur. If there is no failure, the workflow executes to > completion normally. What we've also observed is that if fork_2 leads to a > number of jobs (bash_cp_3, bash_cp_4, bash_cp6, bash_cp_8, bash_cp_10, > bash_cp_12), then the apparently invalid execution paths are any of the > first five. In other words, if any of the first five are seemingly randomly > set by Oozie in Oozie's wf_actions table for the execution_path for join_2, > the re-run will fail. Only if "bash_cp_12" is set then the workflow will > successfully re-run. > > Another thing that might be relevant is that we are using a custom action > executor that submits to SGE (for legacy reasons). The code is available at > https://github.com/SeqWare/oozie-sge/tree/1.0.2 This is with Oozie > version 3.3.2-cdh4.5.0 > > Are there any thoughts on whether there is some API call that we're > failing to make in our custom action executor that affects execution path? > Are we structuring our workflows in some unexpected manner? > What is the meaning of an execution path for a control node such as join > anyways? > > Thanks for any insight! > > Large amounts of text follow .... > > Relevant error in log: > 2014-06-05 14:06:01,599 DEBUG SignalXCommand:545 - USER[dyuen] GROUP[-] > TOKEN[] APP[HelloWorld] JOB[0000000-140605140030484-oozie-oozi-W] > ACTION[0000000-140605140030484-oozie-oozi-W@join_2] STARTED SignalCommand > for jobid=0000000-140605140030484-oozie-oozi-W, > actionId=0000000-140605140030484-oozie-oozi-W@join_2 > 2014-06-05 14:06:01,600 DEBUG LiteWorkflowInstance:545 - USER[dyuen] > GROUP[-] TOKEN[] APP[HelloWorld] JOB[0000000-140605140030484-oozie-oozi-W] > ACTION[0000000-140605140030484-oozie-oozi-W@join_2] Signaling job > execution path [/bash_cp_3/] signal value [OK] > 2014-06-05 14:06:01,600 ERROR LiteWorkflowInstance:536 - USER[dyuen] > GROUP[-] TOKEN[] APP[HelloWorld] JOB[0000000-140605140030484-oozie-oozi-W] > ACTION[0000000-140605140030484-oozie-oozi-W@join_2] invalid execution > path [/bash_cp_3/] > 2014-06-05 14:06:01,601 WARN LiteWorkflowInstance:542 - USER[dyuen] > GROUP[-] TOKEN[] APP[HelloWorld] JOB[0000000-140605140030484-oozie-oozi-W] > ACTION[0000000-140605140030484-oozie-oozi-W@join_2] Workflow completed > [FAILED], failing [0] running nodes > Oozie wf_actions table for the relevant workflow: > > id | > name | signal_value | status | transition | > execution_path > > ----------------------------------------------------------------+---------------------------+--------------+--------+---------------------------+------------------------ > 0000000-140605140030484-oozie-oozi-W@:start: | :start: > | OK | OK | start_0 | / > 0000000-140605140030484-oozie-oozi-W@start_0 | start_0 > | OK | OK | provisionFile_file_in_0_1 | / > 0000000-140605140030484-oozie-oozi-W@provisionFile_file_in_0_1 | > provisionFile_file_in_0_1 | OK | OK | bash_mkdir_2 > | / > 0000000-140605140030484-oozie-oozi-W@bash_mkdir_2 | > bash_mkdir_2 | OK | OK | fork_2 > | / > 0000000-140605140030484-oozie-oozi-W@fork_2 | fork_2 > | OK | OK | * | / > 0000000-140605140030484-oozie-oozi-W@bash_cp_3 | > bash_cp_3 | OK | OK | join_2 > | /bash_cp_3/ > 0000000-140605140030484-oozie-oozi-W@bash_cp_4 | > bash_cp_4 | OK | OK | join_2 > | /bash_cp_4/ > 0000000-140605140030484-oozie-oozi-W@bash_cp_6 | > bash_cp_6 | OK | OK | join_2 > | /bash_cp_6/ > 0000000-140605140030484-oozie-oozi-W@bash_cp_8 | > bash_cp_8 | OK | OK | join_2 > | /bash_cp_8/ > 0000000-140605140030484-oozie-oozi-W@bash_cp_10 | > bash_cp_10 | OK | OK | join_2 > | /bash_cp_10/ > 0000000-140605140030484-oozie-oozi-W@bash_cp_12 | > bash_cp_12 | OK | OK | join_2 > | /bash_cp_12/ > 0000000-140605140030484-oozie-oozi-W@join_2 | join_2 > | OK | OK | fork_3 | > /bash_cp_3/ > 0000000-140605140030484-oozie-oozi-W@fork_3 | fork_3 > | OK | OK | * | / > 0000000-140605140030484-oozie-oozi-W@provisionFile_out_5 | > provisionFile_out_5 | OK | OK | join_3 > | /provisionFile_out_5/ > 0000000-140605140030484-oozie-oozi-W@provisionFile_out_7 | > provisionFile_out_7 | OK | OK | join_3 > | /provisionFile_out_7/ > 0000000-140605140030484-oozie-oozi-W@provisionFile_out_9 | > provisionFile_out_9 | OK | OK | join_3 > | /provisionFile_out_9/ > 0000000-140605140030484-oozie-oozi-W@provisionFile_out_11 | > provisionFile_out_11 | OK | OK | join_3 > | /provisionFile_out_11/ > 0000000-140605140030484-oozie-oozi-W@provisionFile_out_13 | > provisionFile_out_13 | OK | OK | join_3 > | /provisionFile_out_13/ > 0000000-140605140030484-oozie-oozi-W@fail | fail > | OK | OK | | > /bash_cp_14/ > (19 rows) > > > The workflow: > > <?xml version="1.0" encoding="UTF-8"?> > <workflow-app xmlns="uri:oozie:workflow:0.4" name="HelloWorld"> > <start to="start_0" /> > <action name="start_0" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/start_0-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/start_0-qsub.opts</options-file> > </sge> > <ok to="provisionFile_file_in_0_1" /> > <error to="fail" /> > </action> > <action name="provisionFile_file_in_0_1" retry-max="5" > retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_file_in_0_1-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_file_in_0_1-qsub.opts</options-file> > </sge> > <ok to="bash_mkdir_2" /> > <error to="fail" /> > </action> > <action name="bash_mkdir_2" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_mkdir_2-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_mkdir_2-qsub.opts</options-file> > </sge> > <ok to="fork_2" /> > <error to="fail" /> > </action> > <fork name="fork_2"> > <path start="bash_cp_3" /> > <path start="bash_cp_4" /> > <path start="bash_cp_6" /> > <path start="bash_cp_8" /> > <path start="bash_cp_10" /> > <path start="bash_cp_12" /> > </fork> > <action name="bash_cp_3" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_3-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_3-qsub.opts</options-file> > </sge> > <ok to="join_2" /> > <error to="fail" /> > </action> > <action name="bash_cp_4" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_4-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_4-qsub.opts</options-file> > </sge> > <ok to="join_2" /> > <error to="fail" /> > </action> > <action name="bash_cp_6" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_6-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_6-qsub.opts</options-file> > </sge> > <ok to="join_2" /> > <error to="fail" /> > </action> > <action name="bash_cp_8" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_8-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_8-qsub.opts</options-file> > </sge> > <ok to="join_2" /> > <error to="fail" /> > </action> > <action name="bash_cp_10" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_10-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_10-qsub.opts</options-file> > </sge> > <ok to="join_2" /> > <error to="fail" /> > </action> > <action name="bash_cp_12" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_12-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_12-qsub.opts</options-file> > </sge> > <ok to="join_2" /> > <error to="fail" /> > </action> > <join name="join_2" to="fork_3" /> > <fork name="fork_3"> > <path start="bash_cp_14" /> > <path start="provisionFile_out_5" /> > <path start="provisionFile_out_7" /> > <path start="provisionFile_out_9" /> > <path start="provisionFile_out_11" /> > <path start="provisionFile_out_13" /> > </fork> > <action name="bash_cp_14" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_14-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_14-qsub.opts</options-file> > </sge> > <ok to="join_3" /> > <error to="fail" /> > </action> > <action name="provisionFile_out_5" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_5-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_5-qsub.opts</options-file> > </sge> > <ok to="join_3" /> > <error to="fail" /> > </action> > <action name="provisionFile_out_7" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_7-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_7-qsub.opts</options-file> > </sge> > <ok to="join_3" /> > <error to="fail" /> > </action> > <action name="provisionFile_out_9" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_9-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_9-qsub.opts</options-file> > </sge> > <ok to="join_3" /> > <error to="fail" /> > </action> > <action name="provisionFile_out_11" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_11-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_11-qsub.opts</options-file> > </sge> > <ok to="join_3" /> > <error to="fail" /> > </action> > <action name="provisionFile_out_13" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_13-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_13-qsub.opts</options-file> > </sge> > <ok to="join_3" /> > <error to="fail" /> > </action> > <join name="join_3" to="fork_4" /> > <fork name="fork_4"> > <path start="bash_cp_15" /> > <path start="bash_cp_17" /> > <path start="bash_cp_19" /> > <path start="bash_cp_21" /> > <path start="bash_cp_23" /> > </fork> > <action name="bash_cp_15" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_15-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_15-qsub.opts</options-file> > </sge> > <ok to="join_4" /> > <error to="fail" /> > </action> > <action name="bash_cp_17" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_17-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_17-qsub.opts</options-file> > </sge> > <ok to="join_4" /> > <error to="fail" /> > </action> > <action name="bash_cp_19" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_19-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_19-qsub.opts</options-file> > </sge> > <ok to="join_4" /> > <error to="fail" /> > </action> > <action name="bash_cp_21" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_21-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_21-qsub.opts</options-file> > </sge> > <ok to="join_4" /> > <error to="fail" /> > </action> > <action name="bash_cp_23" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_23-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/bash_cp_23-qsub.opts</options-file> > </sge> > <ok to="join_4" /> > <error to="fail" /> > </action> > <join name="join_4" to="fork_5" /> > <fork name="fork_5"> > <path start="provisionFile_out_16" /> > <path start="provisionFile_out_18" /> > <path start="provisionFile_out_20" /> > <path start="provisionFile_out_22" /> > <path start="provisionFile_out_24" /> > </fork> > <action name="provisionFile_out_16" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_16-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_16-qsub.opts</options-file> > </sge> > <ok to="join_5" /> > <error to="fail" /> > </action> > <action name="provisionFile_out_18" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_18-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_18-qsub.opts</options-file> > </sge> > <ok to="join_5" /> > <error to="fail" /> > </action> > <action name="provisionFile_out_20" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_20-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_20-qsub.opts</options-file> > </sge> > <ok to="join_5" /> > <error to="fail" /> > </action> > <action name="provisionFile_out_22" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_22-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_22-qsub.opts</options-file> > </sge> > <ok to="join_5" /> > <error to="fail" /> > </action> > <action name="provisionFile_out_24" retry-max="5" retry-interval="5"> > <sge xmlns="uri:oozie:sge-action:1.0"> > > <script>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_24-runner.sh</script> > > <options-file>/usr/tmp/oozie/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b/generated-scripts/provisionFile_out_24-qsub.opts</options-file> > </sge> > <ok to="join_5" /> > <error to="fail" /> > </action> > <join name="join_5" to="done" /> > <join name="join_274314800376896" to="done" /> > <action name="done"> > <fs> > <delete > path="hdfs://localhost:8020/user/dyuen/seqware_workflow/oozie-8d157b87-5f1a-496f-b66c-8374cd05233b" > /> > </fs> > <ok to="end" /> > <error to="fail" /> > </action> > <kill name="fail"> > <message>Java failed, error > message[${wf:errorMessage(wf:lastErrorNode())}]</message> > </kill> > <end name="end" /> > </workflow-app> > >
