Hi Robert, Thank you for your reply. The brackets indicate joins, when one action has several dependencies. For example after forking in order to delete I need to wait that all the direct actions are finished.
In the above example, if all my error transitions go to join_end, does the workflow finished, or is it in a dead lock? Can I specify the oozie.validate.ForkJoin to false in my job.properties file, or when I start the job in java? Thanks, Étienne On 16 November 2012 21:59, Robert Kanter <[email protected]> wrote: > Hi Etienne, > > > I'm not sure I follow exactly your notation; what do the [ and ] brackets > indicate? > > > You can have multiple joins in your workflow, so instead of having > everything go to join_end, you can have a hierarchy of joins such that each > fork in your workflow is in pair with a join. There is likely a way to > re-work your workflow to be like this. > > > Oozie does some extra checking when you use a fork (e.g. fork and join in > pair, etc). I'm pretty sure that this is the only place where Oozie will > enforce these restrictions. In other words, if Oozie were to actually > start executing your workflow, it wouldn't complain if they're not in a > pair. You can disable this extra checking by setting > oozie.validate.ForkJoin to false in your oozie-site.xml. This may be risky > though and something could go wrong, so do not do this in a production > cluster. > > > - Robert > > > On Fri, Nov 16, 2012 at 8:40 AM, Etienne Dumoulin < > [email protected]> wrote: > > > All, > > > > I am trying to create an Oozie xml file from a DAG data structure. > > > > My data structure (and actions to run) would look like that: > > start----->A1,A2 > > A1----> A7 > > A2----->A7,A3,A4 > > A3----->A6 > > A7,A6,A4---->end > > > > > > Now if the actions output are only temporary, I would like to delete them > > asap, > > I name D the delete actions: > > start----->A1,A2 > > A1----> A7 > > A2----->A7,A3,A4 > > A3----->A6,[A3,A4,A7] > > A7----->D1,[A3,A4,A7] > > A4----->[A3,A4,A7] > > A3,A4,A7---->D2 > > A6 ----> D3 > > D1----->join_end > > D2------>join_end > > D3------>join_end > > join_end ----->end > > > > I have two questions on forks/joins: > > The documentation says that I need to have fork and join in pair, is that > > still true? > > Or is it only better, I just read a post from Virag the 19th of July: > > "To me, the nested forks option you are considering looks good. Its also > > better to have the join in pair." > > > > I would like that my workflow finishes even through one branch fails, let > > say that > > A7 fails, I would like that A6 and A4 proceed. For this type of behaviour > > can > > I link all my error transitions to join_end? > > It will not be possible if I have to pair forks and joins. > > > > Regards, > > > > Étienne > > > <[email protected]>
