Hi Pierre,

Now that you've mentioned it I've found that you can disable fork-join
validation at workflow and application level:
https://oozie.apache.org/docs/4.2.0/WorkflowFunctionalSpec.html#a3.1.5_Fork_and_Join_Control_Nodes

"By default, Oozie performs some validation that any forking in a workflow
is valid and won't lead to any incorrect behavior or instability. However,
if Oozie is preventing a workflow from being submitted and you are very
certain that it should work, you can disable forkjoin validation so that
Oozie will accept the workflow. To disable this validation just for a
specific workflow, simply set *oozie.wf.validate.ForkJoin* to false in the
job.properties file. To disable this validation for all workflows, simply
set *oozie.validate.ForkJoin* to false in the oozie-site.xml file.
Disabling this validation is determined by the AND of both of these
properties, so it will be disabled if either or both are set to false and
only enabled if both are set to true (or not specified)."

You may limit the number of concurrent actions by submitting them into a
queue in YARN and configure the scheduler accordingly.

BRs,
Peter

On Mon, Jun 27, 2016 at 5:22 PM, Pierre Villard <[email protected]
> wrote:

> Hi Peter,
>
> Thanks a lot for your answer, useful references to the JIRAs!
> I'll try to have a look at the code and see if this can be improved.
>
> Out of curiosity, what is the process covered by 'validation of the XML'? I
> am asking because, when doing 'oozie validate' command, it is OK very
> quickly.
>
> Is there a way to "deactivate" this validation part?
>
> In my specific use-case, I could use one single fork/join, the thing is
> that if I take that route, I'd like to be able to limit the number of
> concurrent actions that can run in parallel from the fork. Is it something
> we can do?
>
> Thanks,
> Pierre.
>
>
>
>
>
>
> 2016-06-27 17:01 GMT+02:00 Peter Cseh <[email protected]>:
>
> > Hi Pierre,
> >
> > There was a bugfix around submitting fork jobs which parallelized job
> > submission:
> > https://issues.apache.org/jira/browse/OOZIE-2345
> >
> > But the issue you've reported is known and not resolved yet:
> > https://issues.apache.org/jira/browse/OOZIE-1978
> >
> > I could not find a workaround description, but one sub-workflow per fork
> > may help as the validation of the xml is the slow part.
> > Best regards,
> > Peter
> >
> > On Mon, Jun 27, 2016 at 4:22 PM, Pierre Villard <
> > [email protected]
> > > wrote:
> >
> > > Hi guys,
> > >
> > > I am trying to submit workflows with around 50 actions. However
> depending
> > > of how the workflow is defined and the number of actions, the time
> needed
> > > by Oozie to accept the workflow may change a lot (I am not talking
> about
> > > the execution time of actions, I’m really talking about the time needed
> > > between the moment I launch the command line 'job –run' and the moment
> I
> > > get back the prompt and my job ID).
> > >
> > > The submission time also seems to exponentially depend of the number of
> > > forks in the workflow (5 forks : few seconds, 6 forks : 1 minute, 7
> > forks :
> > > 10 minutes, 8 forks : one hour).
> > >
> > > I was expecting to have workflows with a higher number of actions. Is
> it
> > a
> > > known issue? Is there some tuning to perform? are there workarounds?
> > should
> > > I use sub-workflows?
> > >
> > > Thanks for your help,
> > > Best regards,
> > > Pierre
> > >
> >
> >
> >
> > --
> > Peter Cseh
> > Software Engineer
> > <http://www.cloudera.com>
> >
>



-- 
Peter Cseh
Software Engineer
<http://www.cloudera.com>

Reply via email to