Hi!
I needed some clarifications regarding the oozie launcher job.
1) Is the launcher job launched per workflow application (with several
actions) or per action within a workflow application?
2) Use Case: I have workflows that contain multiple shell actions (which
internally execute spark, hive, pig actions etc.). The reason for using
shell is because additional parameters like partition date can be computed
using custom logic and passed to hive using .q files
Example:
Shell file
hive -hiveconf DATABASE_NAME=$1 -hiveconf MASTER_TABLE_NAME=$2 -hiveconf
SOURCE_TABLE_NAME=$3 -hiveconf -f $4
q file
use ${hiveconf:DATABASE_NAME};
insert overwrite into table ${hiveconf:MASTER_TABLE_NAME} select * from
${hiveconf:SOURCE_TABLE_NAME};
I set the oozie.launcher.mapreduce.job.queuename and
mapreduce.job.queuename to different queues to avoid starvation of task
slots in a single queue. I also omitted the
<capture-output></capture-output> in the corresponding shell action.
However, I still see the launcher job occupying a lot of memory from the
launcher queue.
- Is this because the launcher job caches the log ouput that comes from
hive?
- Is it necessary to give the launcher job enough memory when executing a
shell action the way I am?
- What would happen if I explicitly limited the launcher job memory?
I would highly appreciate it if someone could outline the responsibilities
of the oozie launcher job.
Thanks and Regards,
Niti