Yes, I think that should be added. Its been waiting on RB for a long time :)
Thanks, virag On 2/20/14 2:24 PM, "Alejandro Abdelnur" <[email protected]> wrote: >This is what I refer as sharding, it can be seen as a special type of >fork/join where all shards are doing the same actions on different >datasets >and the number of shards depends on the number of datasets. > >A while ago I've rewritten the workflow lib, cleanning it up a bit and >adding this capability. But never got completed. If there is interest we >could create an umbrella JIRA and complete the integration. > >Thanks. > > >On Thu, Feb 20, 2014 at 1:47 PM, Mona Chitnis <[email protected]> >wrote: > >> If you use the sub-workflow construct, then it would do some error >> reporting for you. If a sub-workflow fails, the parent workflow also >>gets >> updated to failed. Also in Oozie 4.0, the JIRA OOZIE-1264 The "parent" >> property of a subworkflow should be the ID of the parent workflow, helps >> get the dependency graph using IDs. >> >> >> On 2/20/14, 12:52 PM, "Heller, Chris" <[email protected]> wrote: >> >> >Mona, >> > >> >Thanks. That is the road I'm headed down. At the moment. >> > >> >I'll create a Java action which takes the files (or a path glob -- or >> >something) as input, and create multiple Oozie tasks based on that >>input, >> >and then 'wait' for those tasks to complete. >> > >> >A feature like this built into the workflow certainly would be nice, >>since >> >it would better integrate error handling I think. >> > >> >-Chris >> > >> >On 2/20/14, 3:43 PM, "Mona Chitnis" <[email protected]> wrote: >> > >> >>Hi Chris, >> >> >> >>There isn¹t a way of dynamic parallel tasks within the same Oozie >> >>workflow >> >>XML currently. But you can do some programmatically. Using Oozie Java >> >>API, >> >>you can start a dynamic number of sub-workflows based on the number of >> >>outputs. >> >> >> >> >> >>On 2/20/14, 7:05 AM, "Heller, Chris" <[email protected]> wrote: >> >> >> >>>Hi, >> >>> >> >>>I¹m trying to figure out the best way to implement a workflow in >>Oozie. >> >>> >> >>>I am creating a workflow which splits an input into multiple outputs. >> >>> >> >>>Then for each output I want to run another process over each. >> >>> >> >>>The trouble is I cannot know a-priori how many outputs I will have, >>and >> >>>so to post process each I don¹t see how to setup a workflow to run >>the >> >>>next stage. >> >>> >> >>>Ideally the next stage would be a fork/join type of scenario, since >>each >> >>>output can be processed independently. But there isn¹t any way I can >>see >> >>>to setup the fork paths without using some sort of XML generation >> >>>preprocessor. >> >>> >> >>>Does anyone have a suggestion of how to proceed? Am I stuck doing >> >>>workflow generation? Or is there another way to structure this >>workflow >> >>>using the existing primitives? >> >>> >> >>>Thanks, >> >>>Chris >> >> >> >>
