>> In other words, the Oozie server doesn't find out about launched MR jobs until after they're done.
Where is that information searchable or retrievable inside of Oozie after the MR jobs are done? Through the REST endpoint? Instead of the MR example Hive might be more analogous. If a Hive query kicked off 1:M jobs where in Oozie can I retrieve those after they have completed? Can I only do it inside of the workflow or is the data stored somewhere more persistent. >> You/We should make Crunch and Cascading actions instead of using the Java action. Is there some reason this would be a bad idea? I'm not opposed to the idea and while working on CRUNCH-272[1] and digging through Oozie code I think that is needed based on how the data is inserted into the WorkflowActionBean. For disclosure I'm not actually looking for Cascading support but figured if this was solved generically then both projects would get the benefit. [1] - https://issues.apache.org/jira/browse/CRUNCH-272 On Wed, May 28, 2014 at 12:16 PM, Robert Kanter <[email protected]>wrote: > Oozie doesn't actually track the child IDs (except for the MR action, which > has slightly different behavior in that the launcher job exits > immediately); it only reports them once the launcher has finished, which > happens after the actions have actually finished. In other words, the > Oozie server doesn't find out about launched MR jobs until after they're > done. > > If you're using Hadoop 2.4.0 or later (or a Hadoop with YARN-1461 and > MAPREDUCE-5699), you should take a look at OOZIE-1722 where Oozie utilizing > YARN tags to search for jobs that may have already been launched. Would > the tags be helpful? > > That said, I think my original comment on OOZIE-1767 makes sense: > > > Why not just make Crunch and Cascading actions? We can then also give > them > > their own sharelibs, handle any other custom logic, and give them easier > > schemas. I think this would make it easier for other users too. > > You/We should make Crunch and Cascading actions instead of using the Java > action. Is there some reason this would be a bad idea? > > > On Tue, May 27, 2014 at 11:53 AM, Micah Whitacre <[email protected]> wrote: > > > So a bit ago I logged OOZIE-1767[1] to help track child jobs that would > be > > launched from running Crunch or Cascading code inside of an Oozie Java > > action. Oozie currently only tracks the job ids of the launching job and > > not the ids of the jobs that might get spawned. So while the suggestion > on > > the issue is not necessarily the right solution it got me thinking about > > whether or not tracking the child jobs would even be able to solve what I > > was looking for. > > > > I'm curious where are those child job ids stored? Are they retrievable > or > > are they only usable as a parameter/property inside of the workflow > > instance? (e.g. inside of the workflow spec). Or is that information > > stored in a way that I could retrieve it later from Oozie using something > > like the REST API[2]? > > > > [1] - https://issues.apache.org/jira/browse/OOZIE-1767 > > [2] - > > http://oozie.apache.org/docs/3.3.2/WebServicesAPI.html#Job_Information > > >
