Re: Why does Oozie need a Loadbalancer for High Availability?

2019-11-04 Thread Robert Kanter
There's really only the REST API - the CLI actually just makes REST calls.
The REST API is fully documented in the Oozie docs (except for the callback
because it's not a user-facing API).

I'm not actually that familiar with virtual IP - I put that in there
because someone else mentioned it could work as an alternative to the load
balancer.  My understanding is that it's essentially like using a load
balancer but without extra hardware because the DNS server handles the
routing.  Basically, (at least the way things work now), we need a single
address for the Oozie server, and anything that can accomplish that task
will work.

That said, I believe you can technically get away with not using a load
balancer (or equivalent) and simply having all callbacks and API calls go
to a single Oozie server.  The other Oozie servers would still process
workflows and do things correctly, but the "main" one would be doing most
of the work so it would be very unbalanced.  Though you could mitigate some
of that by telling different users to use different Oozie server
addresses.  You'd also lose the benefit of High Availability because if the
"main" Oozie server goes down, then nobody will know how to contact the
other Oozie servers.

- Robert

On Mon, Nov 4, 2019 at 10:11 AM Poepping, Thomas 
wrote:

> The callouts about Oozie launcher and REST API calls are good ones -- I
> only ever interact with Oozie through the Oozie CLI. Is there any one place
> where all supported ways of interacting with Oozie server are documented?
>
> Robert, does virtual IP (as suggested in the doc) solve the issue you
> speak of? The proxy in Oozie client could round robin between three Virtual
> IPs, and additional Oozie servers could be behind those Virtual IPs? That
> becomes an issue of integration then, rather than implementation on Oozie's
> part.
>
> On 11/4/19, 8:42 AM, "Robert Kanter"  wrote:
>
> In addition to the reasons both Andras's mentioned, another reason is
> that
> the client doesn't need to know all of the Oozie server addresses.
> While
> you can update the Oozie client config on your laptop if it worked that
> way, once the Oozie Launcher has started, you can't update the Oozie
> servers it knows about.  For example, suppose you had an Oozie launcher
> that ran for 3 days - you may have added/removed some Oozie servers in
> that
> time and now the Oozie Launcher's list of Oozie servers would be out of
> date.
>
> - Robert
>
> On Mon, Nov 4, 2019 at 4:01 AM Andras Piros  wrote:
>
> > Another point to add is that there are lots of users accessing Oozie
> not
> > via OozieCLI but via direct REST calls. As to my understanding the
> proxied
> > client story could work for OozieCLI only. In any case, it could
> make sense
> > to implement it that way.
> >
> > Regards,
> >
> > Andras
> >
> > On Mon, Nov 4, 2019 at 12:56 PM Andras Salamon
> >  wrote:
> >
> > > Hi,
> > >
> > > HA was added a long time ago, back in 2013. You can find the jira
> here:
> > > https://issues.apache.org/jira/browse/OOZIE-615. There is a
> design docs
> > > attached to the jira, which could be a good starting point. There
> is a
> > > section int it about the Load Balancer:
> > >
> > > "A loadbalancer, virtualIP, or DNSroundrobin: This would go in
> front of
> > the
> > > Oozie servers to (a) provide a single entry point for users so
> they don’t
> > > have to choose between, or even beaware of, multiple Oozie
> servers; and
> > (b)
> > > for callbacks from the JobTracker when a hadoop job is done (which
> can
> > only
> > > take a single address and simply choosing an arbitrary Oozie
> server could
> > > be a problem if that server goes down."
> > >
> > > Best,
> > > Sala
> > >
> > >
> > > On Sat, Nov 2, 2019 at 12:13 AM Poepping, Thomas
> >  > > >
> > > wrote:
> > >
> > > > Hi Oozie development community!
> > > >
> > > > I am looking through documentation for the Oozie High
> Availability
> > > feature
> > > > (
> > >
> https://oozie.apache.org/docs/5.1.0/AG_Install.html#High_Availability_HA
> > > > ) and I am wondering why we need to set up virtual IP or load
> balancing
> > > for
> > > > callbacks from Resource Manager to Oozie? YARN follows a
> different
> > > > convention – including a proxied client that round robins
> between DNS
> > > names
> > > > configured in a list. Is there something blocking Oozie from
> doing the
> > > > same, or was this decision made because it also provides users
> with a
> > > > single endpoint to hit any of the oozie servers running?
> > > >
> > > > If there aren’t strong arguments against, I would like to open a
> JIRA
> > to
> > > > implement this. But first, please give me your comments!
> > > >
> > > > Thanks,
> > > > Tom
> > > >
> > >

Re: Why does Oozie need a Loadbalancer for High Availability?

2019-11-04 Thread Poepping, Thomas
The callouts about Oozie launcher and REST API calls are good ones -- I only 
ever interact with Oozie through the Oozie CLI. Is there any one place where 
all supported ways of interacting with Oozie server are documented?

Robert, does virtual IP (as suggested in the doc) solve the issue you speak of? 
The proxy in Oozie client could round robin between three Virtual IPs, and 
additional Oozie servers could be behind those Virtual IPs? That becomes an 
issue of integration then, rather than implementation on Oozie's part.

On 11/4/19, 8:42 AM, "Robert Kanter"  wrote:

In addition to the reasons both Andras's mentioned, another reason is that
the client doesn't need to know all of the Oozie server addresses.  While
you can update the Oozie client config on your laptop if it worked that
way, once the Oozie Launcher has started, you can't update the Oozie
servers it knows about.  For example, suppose you had an Oozie launcher
that ran for 3 days - you may have added/removed some Oozie servers in that
time and now the Oozie Launcher's list of Oozie servers would be out of
date.

- Robert

On Mon, Nov 4, 2019 at 4:01 AM Andras Piros  wrote:

> Another point to add is that there are lots of users accessing Oozie not
> via OozieCLI but via direct REST calls. As to my understanding the proxied
> client story could work for OozieCLI only. In any case, it could make 
sense
> to implement it that way.
>
> Regards,
>
> Andras
>
> On Mon, Nov 4, 2019 at 12:56 PM Andras Salamon
>  wrote:
>
> > Hi,
> >
> > HA was added a long time ago, back in 2013. You can find the jira here:
> > https://issues.apache.org/jira/browse/OOZIE-615. There is a design docs
> > attached to the jira, which could be a good starting point. There is a
> > section int it about the Load Balancer:
> >
> > "A loadbalancer, virtualIP, or DNSroundrobin: This would go in front of
> the
> > Oozie servers to (a) provide a single entry point for users so they 
don’t
> > have to choose between, or even beaware of, multiple Oozie servers; and
> (b)
> > for callbacks from the JobTracker when a hadoop job is done (which can
> only
> > take a single address and simply choosing an arbitrary Oozie server 
could
> > be a problem if that server goes down."
> >
> > Best,
> > Sala
> >
> >
> > On Sat, Nov 2, 2019 at 12:13 AM Poepping, Thomas
>  > >
> > wrote:
> >
> > > Hi Oozie development community!
> > >
> > > I am looking through documentation for the Oozie High Availability
> > feature
> > > (
> > https://oozie.apache.org/docs/5.1.0/AG_Install.html#High_Availability_HA
> > > ) and I am wondering why we need to set up virtual IP or load 
balancing
> > for
> > > callbacks from Resource Manager to Oozie? YARN follows a different
> > > convention – including a proxied client that round robins between DNS
> > names
> > > configured in a list. Is there something blocking Oozie from doing the
> > > same, or was this decision made because it also provides users with a
> > > single endpoint to hit any of the oozie servers running?
> > >
> > > If there aren’t strong arguments against, I would like to open a JIRA
> to
> > > implement this. But first, please give me your comments!
> > >
> > > Thanks,
> > > Tom
> > >
> >
>




Re: Why does Oozie need a Loadbalancer for High Availability?

2019-11-04 Thread Robert Kanter
In addition to the reasons both Andras's mentioned, another reason is that
the client doesn't need to know all of the Oozie server addresses.  While
you can update the Oozie client config on your laptop if it worked that
way, once the Oozie Launcher has started, you can't update the Oozie
servers it knows about.  For example, suppose you had an Oozie launcher
that ran for 3 days - you may have added/removed some Oozie servers in that
time and now the Oozie Launcher's list of Oozie servers would be out of
date.

- Robert

On Mon, Nov 4, 2019 at 4:01 AM Andras Piros  wrote:

> Another point to add is that there are lots of users accessing Oozie not
> via OozieCLI but via direct REST calls. As to my understanding the proxied
> client story could work for OozieCLI only. In any case, it could make sense
> to implement it that way.
>
> Regards,
>
> Andras
>
> On Mon, Nov 4, 2019 at 12:56 PM Andras Salamon
>  wrote:
>
> > Hi,
> >
> > HA was added a long time ago, back in 2013. You can find the jira here:
> > https://issues.apache.org/jira/browse/OOZIE-615. There is a design docs
> > attached to the jira, which could be a good starting point. There is a
> > section int it about the Load Balancer:
> >
> > "A loadbalancer, virtualIP, or DNSroundrobin: This would go in front of
> the
> > Oozie servers to (a) provide a single entry point for users so they don’t
> > have to choose between, or even beaware of, multiple Oozie servers; and
> (b)
> > for callbacks from the JobTracker when a hadoop job is done (which can
> only
> > take a single address and simply choosing an arbitrary Oozie server could
> > be a problem if that server goes down."
> >
> > Best,
> > Sala
> >
> >
> > On Sat, Nov 2, 2019 at 12:13 AM Poepping, Thomas
>  > >
> > wrote:
> >
> > > Hi Oozie development community!
> > >
> > > I am looking through documentation for the Oozie High Availability
> > feature
> > > (
> > https://oozie.apache.org/docs/5.1.0/AG_Install.html#High_Availability_HA
> > > ) and I am wondering why we need to set up virtual IP or load balancing
> > for
> > > callbacks from Resource Manager to Oozie? YARN follows a different
> > > convention – including a proxied client that round robins between DNS
> > names
> > > configured in a list. Is there something blocking Oozie from doing the
> > > same, or was this decision made because it also provides users with a
> > > single endpoint to hit any of the oozie servers running?
> > >
> > > If there aren’t strong arguments against, I would like to open a JIRA
> to
> > > implement this. But first, please give me your comments!
> > >
> > > Thanks,
> > > Tom
> > >
> >
>


Re: Why does Oozie need a Loadbalancer for High Availability?

2019-11-04 Thread Andras Piros
Another point to add is that there are lots of users accessing Oozie not
via OozieCLI but via direct REST calls. As to my understanding the proxied
client story could work for OozieCLI only. In any case, it could make sense
to implement it that way.

Regards,

Andras

On Mon, Nov 4, 2019 at 12:56 PM Andras Salamon
 wrote:

> Hi,
>
> HA was added a long time ago, back in 2013. You can find the jira here:
> https://issues.apache.org/jira/browse/OOZIE-615. There is a design docs
> attached to the jira, which could be a good starting point. There is a
> section int it about the Load Balancer:
>
> "A loadbalancer, virtualIP, or DNSroundrobin: This would go in front of the
> Oozie servers to (a) provide a single entry point for users so they don’t
> have to choose between, or even beaware of, multiple Oozie servers; and (b)
> for callbacks from the JobTracker when a hadoop job is done (which can only
> take a single address and simply choosing an arbitrary Oozie server could
> be a problem if that server goes down."
>
> Best,
> Sala
>
>
> On Sat, Nov 2, 2019 at 12:13 AM Poepping, Thomas  >
> wrote:
>
> > Hi Oozie development community!
> >
> > I am looking through documentation for the Oozie High Availability
> feature
> > (
> https://oozie.apache.org/docs/5.1.0/AG_Install.html#High_Availability_HA
> > ) and I am wondering why we need to set up virtual IP or load balancing
> for
> > callbacks from Resource Manager to Oozie? YARN follows a different
> > convention – including a proxied client that round robins between DNS
> names
> > configured in a list. Is there something blocking Oozie from doing the
> > same, or was this decision made because it also provides users with a
> > single endpoint to hit any of the oozie servers running?
> >
> > If there aren’t strong arguments against, I would like to open a JIRA to
> > implement this. But first, please give me your comments!
> >
> > Thanks,
> > Tom
> >
>


Re: Why does Oozie need a Loadbalancer for High Availability?

2019-11-04 Thread Andras Salamon
Hi,

HA was added a long time ago, back in 2013. You can find the jira here:
https://issues.apache.org/jira/browse/OOZIE-615. There is a design docs
attached to the jira, which could be a good starting point. There is a
section int it about the Load Balancer:

"A loadbalancer, virtualIP, or DNSroundrobin: This would go in front of the
Oozie servers to (a) provide a single entry point for users so they don’t
have to choose between, or even beaware of, multiple Oozie servers; and (b)
for callbacks from the JobTracker when a hadoop job is done (which can only
take a single address and simply choosing an arbitrary Oozie server could
be a problem if that server goes down."

Best,
Sala


On Sat, Nov 2, 2019 at 12:13 AM Poepping, Thomas 
wrote:

> Hi Oozie development community!
>
> I am looking through documentation for the Oozie High Availability feature
> (https://oozie.apache.org/docs/5.1.0/AG_Install.html#High_Availability_HA
> ) and I am wondering why we need to set up virtual IP or load balancing for
> callbacks from Resource Manager to Oozie? YARN follows a different
> convention – including a proxied client that round robins between DNS names
> configured in a list. Is there something blocking Oozie from doing the
> same, or was this decision made because it also provides users with a
> single endpoint to hit any of the oozie servers running?
>
> If there aren’t strong arguments against, I would like to open a JIRA to
> implement this. But first, please give me your comments!
>
> Thanks,
> Tom
>