GH-49: Supporting bundle in oozie
---------------------------------
Key: OOZIE-89
URL: https://issues.apache.org/jira/browse/OOZIE-89
Project: Oozie
Issue Type: Bug
Reporter: Hadoop QA
Oozie currently has two level of abstractions:
1. Workflow that execute DAG of actions.
2. Coordinator that executes workflow periodically when the specified set of
data directories are available.
This issue proposes another abstraction called 'bundle' that will batch a set
of coordinator applications. The user will be able to
start/stop/suspend/resume/rerun in the bundle level.
******* The proposed high-level requirements to support bundle are enumerated
below:
1. This feature will allow user to specify a list of coordinator applications
in XML file format.
2. The name of the bundle xml file is not hard-coded. User can specify any name
as bundle file.
3. User will submit a bundle by specifying the bundle application path in
config file . An example command is: oozie job -run -config <bundle.properties>
4. Bundle application path is defined in config file as property
"oozie.application.bundle.path" with a value of full path to bundle xml in the
hdfs.
5. User can also submit a bundle job through WS API.
7. User will be able to define variables /parameters for each coordinator
application.
8. All variables should be resolved during job submission. For any resolved
variable, oozie will throw an Exception.
9. User will be able to submit a bundle with an user-defined external id to
avoid duplicate submissions in case of Timeout in first submission.
10. Oozie will not support any explicit dependencies among the coordinator XML
in bundle definition.
11. Oozie will not support any partial bundle submission.
12. When user will submit a bundle , it will get a bundle id to track. Oozie
will put the bundle job into PREP state.
13. User will be able to start a bundle using bundle id. It will put the bundle
job into RUNNING state.
14. User will be able to combine submit and start into run that will start the
bundle immediately.
15. User will be able to optionally specify the kick-off time to determine when
to start a bundle. The bundle will not run until kick-off time reached.
16. User will be able to query Oozie for its status through CLI and WS API.
17. User will be able to query Oozie for all coordinator jobs that it started
through CLI and WS API.
18. User will be able to kill a bundle id that will kill all spawned
coordinator jobs.
19. User will be able to suspend a bundle id that will suspend all spawned
coordinator jobs.
20. User will be able to pause a bundle id with a future time that will pause
all spawned coordinator jobs.
21. User will be able to resume a bundle id that will resume all spawned
coordinator jobs.
22. Bundle rerun requirements TBD.
This is a sample bundle XML :
=========================
<bundle-app name="MY_BUNDLE" xmlns="uri:oozie:bundle:0.1">
<controls>
<kick-off-time>2009-02-02T00:00Z</kick-off-time>
</controls>
<coordinator>
<configuration>
<property>
<name>START_TIME</name>
<value>2009-02-01T00:00Z</value>
<property>
.................
...............
</configuration>
<app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
<coordinator>
<coordinator>
<configuration>
<property>
<name>END_TIME</name>
<value>2010-02-01T00:00Z</value>
<property>
.................
...............
</configuration>
<app-path>hdfs:${NAME_NODE}/tmp/bundle-apps/coordinator1.xml</app-path>
<coordinator>
</bundle-app>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira