Hi Dipesh,

I'd go for a load balancer option that supports calling another service on
deciding who's next. Let the LB address be provided inside job.properties.

Have you seen *Varnish <http://varnish-cache.org/trac/wiki/LoadBalancing>*?
It's a well configurable LB option that can allegedly *call another script
<https://www.varnish-cache.org/lists/pipermail/varnish-misc/2012-February/021690.html>*
.

Andras

--
Andras PIROS
Software Engineer
<http://www.cloudera.com/>

On Wed, Dec 7, 2016 at 8:28 PM, mdk-swandha <[email protected]>
wrote:

> @Andreas
>
> Hope you understood my use case above. I would appreciate if you please
> shed some more light and share about using load balancer to route jobs and
> keeping this load balancer outside. I would like to know how you are
> suggesting enable this load balancer in RM or NN or you are suggesting to
> write this in my service. Please inform me if you anything is not clear in
> my use case.
>
> Thanks.
>
> On Mon, Dec 5, 2016 at 11:57 AM, mdk-swandha <[email protected]>
> wrote:
>
> > You mean I have to set env variables for each job/workflow execution and
> > then it will be picked up by Oozie. And I should set them in my service
> > (the service which is finding the best cluster?).
> >
> > For example let say I have 3 cluster:
> > - When a job is sent via Oozie/Hue/Zepellin/Livy etc. - they are mapped
> to
> > one cluster and jobs always goes there. Let say this as a default cluster
> > - I have a service which determines what can be best cluster for a given
> > job considering various attributes (availability, data locality, network
> > bandwidth etc.)
> > - This service has exposed an API and caller just passes the required
> > parameters(job/input/output/queue etc.) and this service will return the
> > best available cluster
> >
> > With what I have above, I feel keeping the calling code should be in the
> > caller (Oozie/Zepellin/Any application) should be the way to go to keep
> it
> > simple to isolate JT's default behavior. This won't disrupt existing jobs
> > which are running on these clusters by introducing some new settings. May
> > be I'm missing how are you advising creating load balancer setting in JT
> > and configuring it during runtime. Can you please tell me more how this
> can
> > be done?
> >
> > Thanks.
> > -Dipesh
> >
> >
> >
> > On Mon, Dec 5, 2016 at 10:59 AM, Andras Piros <[email protected]
> >
> > wrote:
> >
> >> Hi Dipesh,
> >>
> >> during workflow / job submission you can define variables inside
> >> job.properties coming e.g. from env vars that are used in workflow.xml.
> So
> >> much for the flexibility.
> >>
> >> Can you tell me a use case where runtime routing to different JT / NN
> >> instances via Oozie (and not e.g. coming from a load balancer setting
> >> configured runtime) is better?
> >>
> >> Thanks,
> >>
> >> Andras
> >>
> >> --
> >> Andras PIROS
> >> Software Engineer
> >> <http://www.cloudera.com/>
> >>
> >> On Mon, Dec 5, 2016 at 7:45 PM, mdk-swandha <[email protected]>
> >> wrote:
> >>
> >> > Hi Alex,
> >> >
> >> > The idea is to call this external service which will find the best
> >> cluster
> >> > and inform the caller. So today this caller is Oozie, tomorrow it will
> >> be
> >> > Zeppelin or any other application.
> >> >
> >> > How can I provide multiple JT and NN addresses in job.properties? You
> >> mean
> >> > during job/workflow creation? I will still need to overwrite
> >> job.properties
> >> > or provide these values somewhere dynamically?
> >> >
> >> > Thanks.
> >> > -Dipesh
> >> >
> >> > On Mon, Dec 5, 2016 at 5:24 AM, Andras Piros <
> [email protected]
> >> >
> >> > wrote:
> >> >
> >> > > Hi Dipesh,
> >> > >
> >> > > seems like a bad idea to programmatically change job-tracker or
> >> > > name-node properties
> >> > > - it's just not the task of Oozie to determine what are the exact JT
> >> or
> >> > NN
> >> > > instances Oozie should use.
> >> > >
> >> > > Instead, I'd rather setup a load balancer for JT and another one for
> >> NN,
> >> > > and provide those addresses to Oozie's job.properties. That way, we
> >> > > separate concerns - the load balancer can choose the JT or NN node
> >> > runtime,
> >> > > e.g. on a round robin basis.
> >> > >
> >> > > Regards,
> >> > >
> >> > > Andras
> >> > >
> >> > > --
> >> > > Andras PIROS
> >> > > Software Engineer
> >> > > <http://www.cloudera.com/>
> >> > >
> >> > > On Thu, Dec 1, 2016 at 9:29 PM, mdk-swandha <
> [email protected]
> >> >
> >> > > wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > I have a use case like this - in a multi cluster (hadoop cluster)
> >> > > > environment if I would like to send a job/oozie workflow to a
> >> desired
> >> > > > cluster during runtime, how can this be done.
> >> > > >
> >> > > > I see that there is JavaActionExecutor class which read NN and
> >> > JobTracker
> >> > > > in createBaseHadoopConf method
> >> > > >
> >> > > > All HadoopActionExectors are derived from JavaActionExecutor so
> this
> >> > > seems
> >> > > > to be a place wherein I can insert my code. How can I do this
> >> without
> >> > > > disrupting the original flow by adding my hook.
> >> > > >
> >> > > > One option is to to derive my new JavaActionExecutor and over ride
> >> > > > createBaseHadoopConf method and then derive all ActionExecutors
> >> from my
> >> > > new
> >> > > > JavaActionExecutor. It doesn't seem to be elegant to me, so
> thought
> >> to
> >> > > ask
> >> > > > out here.
> >> > > >
> >> > > > Any input will be useful.
> >> > > >
> >> > > > Thanks.
> >> > > > -Dipesh
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Reply via email to