You mean I have to set env variables for each job/workflow execution and
then it will be picked up by Oozie. And I should set them in my service
(the service which is finding the best cluster?).

For example let say I have 3 cluster:
- When a job is sent via Oozie/Hue/Zepellin/Livy etc. - they are mapped to
one cluster and jobs always goes there. Let say this as a default cluster
- I have a service which determines what can be best cluster for a given
job considering various attributes (availability, data locality, network
bandwidth etc.)
- This service has exposed an API and caller just passes the required
parameters(job/input/output/queue etc.) and this service will return the
best available cluster

With what I have above, I feel keeping the calling code should be in the
caller (Oozie/Zepellin/Any application) should be the way to go to keep it
simple to isolate JT's default behavior. This won't disrupt existing jobs
which are running on these clusters by introducing some new settings. May
be I'm missing how are you advising creating load balancer setting in JT
and configuring it during runtime. Can you please tell me more how this can
be done?

Thanks.
-Dipesh



On Mon, Dec 5, 2016 at 10:59 AM, Andras Piros <[email protected]>
wrote:

> Hi Dipesh,
>
> during workflow / job submission you can define variables inside
> job.properties coming e.g. from env vars that are used in workflow.xml. So
> much for the flexibility.
>
> Can you tell me a use case where runtime routing to different JT / NN
> instances via Oozie (and not e.g. coming from a load balancer setting
> configured runtime) is better?
>
> Thanks,
>
> Andras
>
> --
> Andras PIROS
> Software Engineer
> <http://www.cloudera.com/>
>
> On Mon, Dec 5, 2016 at 7:45 PM, mdk-swandha <[email protected]>
> wrote:
>
> > Hi Alex,
> >
> > The idea is to call this external service which will find the best
> cluster
> > and inform the caller. So today this caller is Oozie, tomorrow it will be
> > Zeppelin or any other application.
> >
> > How can I provide multiple JT and NN addresses in job.properties? You
> mean
> > during job/workflow creation? I will still need to overwrite
> job.properties
> > or provide these values somewhere dynamically?
> >
> > Thanks.
> > -Dipesh
> >
> > On Mon, Dec 5, 2016 at 5:24 AM, Andras Piros <[email protected]>
> > wrote:
> >
> > > Hi Dipesh,
> > >
> > > seems like a bad idea to programmatically change job-tracker or
> > > name-node properties
> > > - it's just not the task of Oozie to determine what are the exact JT or
> > NN
> > > instances Oozie should use.
> > >
> > > Instead, I'd rather setup a load balancer for JT and another one for
> NN,
> > > and provide those addresses to Oozie's job.properties. That way, we
> > > separate concerns - the load balancer can choose the JT or NN node
> > runtime,
> > > e.g. on a round robin basis.
> > >
> > > Regards,
> > >
> > > Andras
> > >
> > > --
> > > Andras PIROS
> > > Software Engineer
> > > <http://www.cloudera.com/>
> > >
> > > On Thu, Dec 1, 2016 at 9:29 PM, mdk-swandha <[email protected]>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I have a use case like this - in a multi cluster (hadoop cluster)
> > > > environment if I would like to send a job/oozie workflow to a desired
> > > > cluster during runtime, how can this be done.
> > > >
> > > > I see that there is JavaActionExecutor class which read NN and
> > JobTracker
> > > > in createBaseHadoopConf method
> > > >
> > > > All HadoopActionExectors are derived from JavaActionExecutor so this
> > > seems
> > > > to be a place wherein I can insert my code. How can I do this without
> > > > disrupting the original flow by adding my hook.
> > > >
> > > > One option is to to derive my new JavaActionExecutor and over ride
> > > > createBaseHadoopConf method and then derive all ActionExecutors from
> my
> > > new
> > > > JavaActionExecutor. It doesn't seem to be elegant to me, so thought
> to
> > > ask
> > > > out here.
> > > >
> > > > Any input will be useful.
> > > >
> > > > Thanks.
> > > > -Dipesh
> > > >
> > >
> >
>

Reply via email to