Hi Dipesh, I'd go for a load balancer option that supports calling another service on deciding who's next. Let the LB address be provided inside job.properties.
Have you seen *Varnish <http://varnish-cache.org/trac/wiki/LoadBalancing>*? It's a well configurable LB option that can allegedly *call another script <https://www.varnish-cache.org/lists/pipermail/varnish-misc/2012-February/021690.html>* . Andras -- Andras PIROS Software Engineer <http://www.cloudera.com/> On Wed, Dec 7, 2016 at 8:28 PM, mdk-swandha <[email protected]> wrote: > @Andreas > > Hope you understood my use case above. I would appreciate if you please > shed some more light and share about using load balancer to route jobs and > keeping this load balancer outside. I would like to know how you are > suggesting enable this load balancer in RM or NN or you are suggesting to > write this in my service. Please inform me if you anything is not clear in > my use case. > > Thanks. > > On Mon, Dec 5, 2016 at 11:57 AM, mdk-swandha <[email protected]> > wrote: > > > You mean I have to set env variables for each job/workflow execution and > > then it will be picked up by Oozie. And I should set them in my service > > (the service which is finding the best cluster?). > > > > For example let say I have 3 cluster: > > - When a job is sent via Oozie/Hue/Zepellin/Livy etc. - they are mapped > to > > one cluster and jobs always goes there. Let say this as a default cluster > > - I have a service which determines what can be best cluster for a given > > job considering various attributes (availability, data locality, network > > bandwidth etc.) > > - This service has exposed an API and caller just passes the required > > parameters(job/input/output/queue etc.) and this service will return the > > best available cluster > > > > With what I have above, I feel keeping the calling code should be in the > > caller (Oozie/Zepellin/Any application) should be the way to go to keep > it > > simple to isolate JT's default behavior. This won't disrupt existing jobs > > which are running on these clusters by introducing some new settings. May > > be I'm missing how are you advising creating load balancer setting in JT > > and configuring it during runtime. Can you please tell me more how this > can > > be done? > > > > Thanks. > > -Dipesh > > > > > > > > On Mon, Dec 5, 2016 at 10:59 AM, Andras Piros <[email protected] > > > > wrote: > > > >> Hi Dipesh, > >> > >> during workflow / job submission you can define variables inside > >> job.properties coming e.g. from env vars that are used in workflow.xml. > So > >> much for the flexibility. > >> > >> Can you tell me a use case where runtime routing to different JT / NN > >> instances via Oozie (and not e.g. coming from a load balancer setting > >> configured runtime) is better? > >> > >> Thanks, > >> > >> Andras > >> > >> -- > >> Andras PIROS > >> Software Engineer > >> <http://www.cloudera.com/> > >> > >> On Mon, Dec 5, 2016 at 7:45 PM, mdk-swandha <[email protected]> > >> wrote: > >> > >> > Hi Alex, > >> > > >> > The idea is to call this external service which will find the best > >> cluster > >> > and inform the caller. So today this caller is Oozie, tomorrow it will > >> be > >> > Zeppelin or any other application. > >> > > >> > How can I provide multiple JT and NN addresses in job.properties? You > >> mean > >> > during job/workflow creation? I will still need to overwrite > >> job.properties > >> > or provide these values somewhere dynamically? > >> > > >> > Thanks. > >> > -Dipesh > >> > > >> > On Mon, Dec 5, 2016 at 5:24 AM, Andras Piros < > [email protected] > >> > > >> > wrote: > >> > > >> > > Hi Dipesh, > >> > > > >> > > seems like a bad idea to programmatically change job-tracker or > >> > > name-node properties > >> > > - it's just not the task of Oozie to determine what are the exact JT > >> or > >> > NN > >> > > instances Oozie should use. > >> > > > >> > > Instead, I'd rather setup a load balancer for JT and another one for > >> NN, > >> > > and provide those addresses to Oozie's job.properties. That way, we > >> > > separate concerns - the load balancer can choose the JT or NN node > >> > runtime, > >> > > e.g. on a round robin basis. > >> > > > >> > > Regards, > >> > > > >> > > Andras > >> > > > >> > > -- > >> > > Andras PIROS > >> > > Software Engineer > >> > > <http://www.cloudera.com/> > >> > > > >> > > On Thu, Dec 1, 2016 at 9:29 PM, mdk-swandha < > [email protected] > >> > > >> > > wrote: > >> > > > >> > > > Hi, > >> > > > > >> > > > I have a use case like this - in a multi cluster (hadoop cluster) > >> > > > environment if I would like to send a job/oozie workflow to a > >> desired > >> > > > cluster during runtime, how can this be done. > >> > > > > >> > > > I see that there is JavaActionExecutor class which read NN and > >> > JobTracker > >> > > > in createBaseHadoopConf method > >> > > > > >> > > > All HadoopActionExectors are derived from JavaActionExecutor so > this > >> > > seems > >> > > > to be a place wherein I can insert my code. How can I do this > >> without > >> > > > disrupting the original flow by adding my hook. > >> > > > > >> > > > One option is to to derive my new JavaActionExecutor and over ride > >> > > > createBaseHadoopConf method and then derive all ActionExecutors > >> from my > >> > > new > >> > > > JavaActionExecutor. It doesn't seem to be elegant to me, so > thought > >> to > >> > > ask > >> > > > out here. > >> > > > > >> > > > Any input will be useful. > >> > > > > >> > > > Thanks. > >> > > > -Dipesh > >> > > > > >> > > > >> > > >> > > > > >
