Re: LIBPROCSES_IP

2016-10-16 Thread Jie Yu
>
> Do you proposal include this?


If we don't set LIBPROCESS_IP, by default, it'll bind to 0.0.0.0.

- Jie

On Sun, Oct 16, 2016 at 7:09 PM, haosdent  wrote:

> Sounds great!
>
> >libprocess should always bind to 0.0.0.0
>
> Do you proposal include this?
>
> On Mon, Oct 17, 2016 at 2:12 AM, Jie Yu  wrote:
>
> > OK, guys. Thanks for the input! Here is my proposal:
> >
> > 1) If the container uses host network, Mesos agent will set
> > LIBPROCESS_ADVERTISE_IP
> > to agent IP. This is for the case where DNS is not configured properly on
> > the host (we don't need to do that if DNS is configured properly). By
> doing
> > this, libprocess will skip hostname lookup and advertise
> > LIBPROCESS_ADVERTISE_IP
> > directly.
> >
> > 2) If the container uses non-host network, and defines port mapping
> (e.g.,
> > bridge). Mesos agent will not set any libprocess env variables. Given
> that
> > there could be multiple mapped ports, Mesos agent don't know how to
> > set LIBPROCESS_ADVERTISE_PORT.
> > So it's framework's responsibility to set LIBPROCESS_ADVERTISE_IP and
> > LIBPROCESS_ADVERTISE_PORT
> > properly in this case (through CommandInfo.environment)
> >
> > 3) If the container uses non-host network, and does not define port
> mapping
> > (e.g., ip per container). Mesos agent will not set any libprocess env
> > variables. In this case, both CNI isolator and docker engine will
> properly
> > setup DNS in the container so hostname lookup should work properly.
> >
> > - Jie
> >
> > On Sat, Oct 15, 2016 at 4:01 PM, tommy xiao  wrote:
> >
> > > good point, +1
> > >
> > > 2016-10-13 0:27 GMT+08:00 Jie Yu :
> > >
> > > > Stephan,
> > > >
> > > > I think the only time the framework needs to set
> > LIBPROCESS_ADVERTISE_IP
> > > is
> > > > when DNAT is necessary for the container (e.g., bridge). In that
> > > > case, LIBPROCESS_ADVERTISE_IP should always be agent ip and
> > > > the relevant host port allocated for the container. For other cases,
> > > > framework should not do anything.
> > > >
> > > > - Jie
> > > >
> > > > On Wed, Oct 12, 2016 at 4:43 AM, Erb, Stephan <
> > > stephan@blue-yonder.com
> > > > >
> > > > wrote:
> > > >
> > > > > >Framework should be the one that sets
> > > > > >LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT
> appropriately
> > if
> > > > it
> > > > > >tries to launch another Mesos framework so that Master can reach
> the
> > > new
> > > > > >framework.
> > > > >
> > > > > As a framework/executor author this is not possible in all
> scenarios:
> > > > > There is no way to discover IP addresses assigned via CNI before
> the
> > > > first
> > > > > StatusUpdate has been received. It is therefore not possible to set
> > > > > LIBPROCESS_ADVERTISE_IP appropriately at launch time.
> > > > >
> > > > > Please see https://issues.apache.org/jira/browse/MESOS-6281 for
> > > details.
> > > > >
> > > > >
> > > > > On 12/10/16 06:42, "Avinash Sridharan" 
> > wrote:
> > > > >
> > > > > Valid point. Makes sense to drive this decision from the user
> and
> > > the
> > > > > framework.
> > > > >
> > > > > On Tue, Oct 11, 2016 at 9:32 PM, Jie Yu 
> > > wrote:
> > > > >
> > > > > > >
> > > > > > > While I believe this particular logic of setting
> > > > > LIBPROCESS_ADVERTISE_IP
> > > > > > > to agent IP can be done in the agent (it could look at the
> > port
> > > > > mapping
> > > > > > > as well)
> > > > > >
> > > > > >
> > > > > > What if there are multiple port mappings? How can the agent
> > > decide
> > > > > which
> > > > > > port to be used as  LIBPROCESS_ADVERTISE_PORT?
> > > > > >
> > > > > > On Tue, Oct 11, 2016 at 9:27 PM, Avinash Sridharan <
> > > > > avin...@mesosphere.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Definitely a +1 for executor binding to 0.0.0.0, instead of
> > > > doing a
> > > > > > > `gethostname` and `getaddrinfo`. But I am assuming this
> > > semantics
> > > > > would
> > > > > > > kick in only if LIBPROCESS_IP is not set, which should be
> the
> > > > norm.
> > > > > > >
> > > > > > > +1 for LIBPROCESS_ADVERTISE_IP and
> LIBPROCESS_ADVERTISE_PORT
> > > and
> > > > > the onus
> > > > > > > being on the frameworks to set these variables. I guess the
> > > > > framework can
> > > > > > > set the LIBPROCESS_ADVERTISE_IP to the agent IP and
> > > > > > > LIBPROCESS_ADVERTISE_PORT to the host port when it
> specifies
> > a
> > > > > > > port-mapping. While I believe this particular logic of
> > > > > > > setting LIBPROCESS_ADVERTISE_IP to agent IP can be done in
> > the
> > > > > agent (it
> > > > > > > could look at the port mapping as well), when to actually
> set
> > > > these
> > > > > > > variables (whether the executors even need to advertise
> their
> > > IP
> > > > > > addresses,
> > > > > > > is a decision that the Frameworks should be privy too and
> not
> > > > left
> > > > > to the
> > > > > > > agent.
> > > > > > >
> > > > > >

Re: LIBPROCSES_IP

2016-10-16 Thread haosdent
Sounds great!

>libprocess should always bind to 0.0.0.0

Do you proposal include this?

On Mon, Oct 17, 2016 at 2:12 AM, Jie Yu  wrote:

> OK, guys. Thanks for the input! Here is my proposal:
>
> 1) If the container uses host network, Mesos agent will set
> LIBPROCESS_ADVERTISE_IP
> to agent IP. This is for the case where DNS is not configured properly on
> the host (we don't need to do that if DNS is configured properly). By doing
> this, libprocess will skip hostname lookup and advertise
> LIBPROCESS_ADVERTISE_IP
> directly.
>
> 2) If the container uses non-host network, and defines port mapping (e.g.,
> bridge). Mesos agent will not set any libprocess env variables. Given that
> there could be multiple mapped ports, Mesos agent don't know how to
> set LIBPROCESS_ADVERTISE_PORT.
> So it's framework's responsibility to set LIBPROCESS_ADVERTISE_IP and
> LIBPROCESS_ADVERTISE_PORT
> properly in this case (through CommandInfo.environment)
>
> 3) If the container uses non-host network, and does not define port mapping
> (e.g., ip per container). Mesos agent will not set any libprocess env
> variables. In this case, both CNI isolator and docker engine will properly
> setup DNS in the container so hostname lookup should work properly.
>
> - Jie
>
> On Sat, Oct 15, 2016 at 4:01 PM, tommy xiao  wrote:
>
> > good point, +1
> >
> > 2016-10-13 0:27 GMT+08:00 Jie Yu :
> >
> > > Stephan,
> > >
> > > I think the only time the framework needs to set
> LIBPROCESS_ADVERTISE_IP
> > is
> > > when DNAT is necessary for the container (e.g., bridge). In that
> > > case, LIBPROCESS_ADVERTISE_IP should always be agent ip and
> > > the relevant host port allocated for the container. For other cases,
> > > framework should not do anything.
> > >
> > > - Jie
> > >
> > > On Wed, Oct 12, 2016 at 4:43 AM, Erb, Stephan <
> > stephan@blue-yonder.com
> > > >
> > > wrote:
> > >
> > > > >Framework should be the one that sets
> > > > >LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT appropriately
> if
> > > it
> > > > >tries to launch another Mesos framework so that Master can reach the
> > new
> > > > >framework.
> > > >
> > > > As a framework/executor author this is not possible in all scenarios:
> > > > There is no way to discover IP addresses assigned via CNI before the
> > > first
> > > > StatusUpdate has been received. It is therefore not possible to set
> > > > LIBPROCESS_ADVERTISE_IP appropriately at launch time.
> > > >
> > > > Please see https://issues.apache.org/jira/browse/MESOS-6281 for
> > details.
> > > >
> > > >
> > > > On 12/10/16 06:42, "Avinash Sridharan" 
> wrote:
> > > >
> > > > Valid point. Makes sense to drive this decision from the user and
> > the
> > > > framework.
> > > >
> > > > On Tue, Oct 11, 2016 at 9:32 PM, Jie Yu 
> > wrote:
> > > >
> > > > > >
> > > > > > While I believe this particular logic of setting
> > > > LIBPROCESS_ADVERTISE_IP
> > > > > > to agent IP can be done in the agent (it could look at the
> port
> > > > mapping
> > > > > > as well)
> > > > >
> > > > >
> > > > > What if there are multiple port mappings? How can the agent
> > decide
> > > > which
> > > > > port to be used as  LIBPROCESS_ADVERTISE_PORT?
> > > > >
> > > > > On Tue, Oct 11, 2016 at 9:27 PM, Avinash Sridharan <
> > > > avin...@mesosphere.io>
> > > > > wrote:
> > > > >
> > > > > > Definitely a +1 for executor binding to 0.0.0.0, instead of
> > > doing a
> > > > > > `gethostname` and `getaddrinfo`. But I am assuming this
> > semantics
> > > > would
> > > > > > kick in only if LIBPROCESS_IP is not set, which should be the
> > > norm.
> > > > > >
> > > > > > +1 for LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT
> > and
> > > > the onus
> > > > > > being on the frameworks to set these variables. I guess the
> > > > framework can
> > > > > > set the LIBPROCESS_ADVERTISE_IP to the agent IP and
> > > > > > LIBPROCESS_ADVERTISE_PORT to the host port when it specifies
> a
> > > > > > port-mapping. While I believe this particular logic of
> > > > > > setting LIBPROCESS_ADVERTISE_IP to agent IP can be done in
> the
> > > > agent (it
> > > > > > could look at the port mapping as well), when to actually set
> > > these
> > > > > > variables (whether the executors even need to advertise their
> > IP
> > > > > addresses,
> > > > > > is a decision that the Frameworks should be privy too and not
> > > left
> > > > to the
> > > > > > agent.
> > > > > >
> > > > > > On Tue, Oct 11, 2016 at 7:31 PM, haosdent <
> haosd...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > > libprocess should always bind to 0.0.0.0
> > > > > > > + 1 for this
> > > > > > >
> > > > > > > On Wed, Oct 12, 2016 at 2:33 AM, Jie Yu <
> yujie@gmail.com
> > >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi folks,
> > > > > > > >
> > > > > > > > I was in the process of cleaning up some tech debt
> r

Re: LIBPROCSES_IP

2016-10-16 Thread Jie Yu
OK, guys. Thanks for the input! Here is my proposal:

1) If the container uses host network, Mesos agent will set
LIBPROCESS_ADVERTISE_IP
to agent IP. This is for the case where DNS is not configured properly on
the host (we don't need to do that if DNS is configured properly). By doing
this, libprocess will skip hostname lookup and advertise
LIBPROCESS_ADVERTISE_IP
directly.

2) If the container uses non-host network, and defines port mapping (e.g.,
bridge). Mesos agent will not set any libprocess env variables. Given that
there could be multiple mapped ports, Mesos agent don't know how to
set LIBPROCESS_ADVERTISE_PORT.
So it's framework's responsibility to set LIBPROCESS_ADVERTISE_IP and
LIBPROCESS_ADVERTISE_PORT
properly in this case (through CommandInfo.environment)

3) If the container uses non-host network, and does not define port mapping
(e.g., ip per container). Mesos agent will not set any libprocess env
variables. In this case, both CNI isolator and docker engine will properly
setup DNS in the container so hostname lookup should work properly.

- Jie

On Sat, Oct 15, 2016 at 4:01 PM, tommy xiao  wrote:

> good point, +1
>
> 2016-10-13 0:27 GMT+08:00 Jie Yu :
>
> > Stephan,
> >
> > I think the only time the framework needs to set LIBPROCESS_ADVERTISE_IP
> is
> > when DNAT is necessary for the container (e.g., bridge). In that
> > case, LIBPROCESS_ADVERTISE_IP should always be agent ip and
> > the relevant host port allocated for the container. For other cases,
> > framework should not do anything.
> >
> > - Jie
> >
> > On Wed, Oct 12, 2016 at 4:43 AM, Erb, Stephan <
> stephan@blue-yonder.com
> > >
> > wrote:
> >
> > > >Framework should be the one that sets
> > > >LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT appropriately if
> > it
> > > >tries to launch another Mesos framework so that Master can reach the
> new
> > > >framework.
> > >
> > > As a framework/executor author this is not possible in all scenarios:
> > > There is no way to discover IP addresses assigned via CNI before the
> > first
> > > StatusUpdate has been received. It is therefore not possible to set
> > > LIBPROCESS_ADVERTISE_IP appropriately at launch time.
> > >
> > > Please see https://issues.apache.org/jira/browse/MESOS-6281 for
> details.
> > >
> > >
> > > On 12/10/16 06:42, "Avinash Sridharan"  wrote:
> > >
> > > Valid point. Makes sense to drive this decision from the user and
> the
> > > framework.
> > >
> > > On Tue, Oct 11, 2016 at 9:32 PM, Jie Yu 
> wrote:
> > >
> > > > >
> > > > > While I believe this particular logic of setting
> > > LIBPROCESS_ADVERTISE_IP
> > > > > to agent IP can be done in the agent (it could look at the port
> > > mapping
> > > > > as well)
> > > >
> > > >
> > > > What if there are multiple port mappings? How can the agent
> decide
> > > which
> > > > port to be used as  LIBPROCESS_ADVERTISE_PORT?
> > > >
> > > > On Tue, Oct 11, 2016 at 9:27 PM, Avinash Sridharan <
> > > avin...@mesosphere.io>
> > > > wrote:
> > > >
> > > > > Definitely a +1 for executor binding to 0.0.0.0, instead of
> > doing a
> > > > > `gethostname` and `getaddrinfo`. But I am assuming this
> semantics
> > > would
> > > > > kick in only if LIBPROCESS_IP is not set, which should be the
> > norm.
> > > > >
> > > > > +1 for LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT
> and
> > > the onus
> > > > > being on the frameworks to set these variables. I guess the
> > > framework can
> > > > > set the LIBPROCESS_ADVERTISE_IP to the agent IP and
> > > > > LIBPROCESS_ADVERTISE_PORT to the host port when it specifies a
> > > > > port-mapping. While I believe this particular logic of
> > > > > setting LIBPROCESS_ADVERTISE_IP to agent IP can be done in the
> > > agent (it
> > > > > could look at the port mapping as well), when to actually set
> > these
> > > > > variables (whether the executors even need to advertise their
> IP
> > > > addresses,
> > > > > is a decision that the Frameworks should be privy too and not
> > left
> > > to the
> > > > > agent.
> > > > >
> > > > > On Tue, Oct 11, 2016 at 7:31 PM, haosdent 
> > > wrote:
> > > > >
> > > > > > > libprocess should always bind to 0.0.0.0
> > > > > > + 1 for this
> > > > > >
> > > > > > On Wed, Oct 12, 2016 at 2:33 AM, Jie Yu  >
> > > wrote:
> > > > > >
> > > > > > > Hi folks,
> > > > > > >
> > > > > > > I was in the process of cleaning up some tech debt related
> to
> > > env
> > > > > > variables
> > > > > > > in our code base. I created an epic ticket
> > > > > > >  to
> > track. I
> > > > > searched
> > > > > > > relevant tickets fired previously, and found MESOS-3740
> > > > > > > . I did
> > some
> > > > digging
> > > > > > on
> > > > > > > how we handle LIBPROCES

Re: LIBPROCSES_IP

2016-10-15 Thread tommy xiao
good point, +1

2016-10-13 0:27 GMT+08:00 Jie Yu :

> Stephan,
>
> I think the only time the framework needs to set LIBPROCESS_ADVERTISE_IP is
> when DNAT is necessary for the container (e.g., bridge). In that
> case, LIBPROCESS_ADVERTISE_IP should always be agent ip and
> the relevant host port allocated for the container. For other cases,
> framework should not do anything.
>
> - Jie
>
> On Wed, Oct 12, 2016 at 4:43 AM, Erb, Stephan  >
> wrote:
>
> > >Framework should be the one that sets
> > >LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT appropriately if
> it
> > >tries to launch another Mesos framework so that Master can reach the new
> > >framework.
> >
> > As a framework/executor author this is not possible in all scenarios:
> > There is no way to discover IP addresses assigned via CNI before the
> first
> > StatusUpdate has been received. It is therefore not possible to set
> > LIBPROCESS_ADVERTISE_IP appropriately at launch time.
> >
> > Please see https://issues.apache.org/jira/browse/MESOS-6281 for details.
> >
> >
> > On 12/10/16 06:42, "Avinash Sridharan"  wrote:
> >
> > Valid point. Makes sense to drive this decision from the user and the
> > framework.
> >
> > On Tue, Oct 11, 2016 at 9:32 PM, Jie Yu  wrote:
> >
> > > >
> > > > While I believe this particular logic of setting
> > LIBPROCESS_ADVERTISE_IP
> > > > to agent IP can be done in the agent (it could look at the port
> > mapping
> > > > as well)
> > >
> > >
> > > What if there are multiple port mappings? How can the agent decide
> > which
> > > port to be used as  LIBPROCESS_ADVERTISE_PORT?
> > >
> > > On Tue, Oct 11, 2016 at 9:27 PM, Avinash Sridharan <
> > avin...@mesosphere.io>
> > > wrote:
> > >
> > > > Definitely a +1 for executor binding to 0.0.0.0, instead of
> doing a
> > > > `gethostname` and `getaddrinfo`. But I am assuming this semantics
> > would
> > > > kick in only if LIBPROCESS_IP is not set, which should be the
> norm.
> > > >
> > > > +1 for LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT and
> > the onus
> > > > being on the frameworks to set these variables. I guess the
> > framework can
> > > > set the LIBPROCESS_ADVERTISE_IP to the agent IP and
> > > > LIBPROCESS_ADVERTISE_PORT to the host port when it specifies a
> > > > port-mapping. While I believe this particular logic of
> > > > setting LIBPROCESS_ADVERTISE_IP to agent IP can be done in the
> > agent (it
> > > > could look at the port mapping as well), when to actually set
> these
> > > > variables (whether the executors even need to advertise their IP
> > > addresses,
> > > > is a decision that the Frameworks should be privy too and not
> left
> > to the
> > > > agent.
> > > >
> > > > On Tue, Oct 11, 2016 at 7:31 PM, haosdent 
> > wrote:
> > > >
> > > > > > libprocess should always bind to 0.0.0.0
> > > > > + 1 for this
> > > > >
> > > > > On Wed, Oct 12, 2016 at 2:33 AM, Jie Yu 
> > wrote:
> > > > >
> > > > > > Hi folks,
> > > > > >
> > > > > > I was in the process of cleaning up some tech debt related to
> > env
> > > > > variables
> > > > > > in our code base. I created an epic ticket
> > > > > >  to
> track. I
> > > > searched
> > > > > > relevant tickets fired previously, and found MESOS-3740
> > > > > > . I did
> some
> > > digging
> > > > > on
> > > > > > how we handle LIBPROCESS_IP currently, and here are my
> > findings:
> > > > > >
> > > > > > 1) We always set LIBPROCESS_IP in the executor environment
> > variables:
> > > > > > https://github.com/apache/mesos/blob/master/src/slave/
> > > > > > slave.cpp#L6793-L6796
> > > > > >
> > > > > > This is not an issue for an executor that runs on host
> network.
> > > > However,
> > > > > if
> > > > > > the executor wants to run on non-host network (e.g.,
> overlay),
> > this
> > > > might
> > > > > > be problematic, because libprocess for the executor will try
> > to bind
> > > to
> > > > > > LIBPROCESS_IP, but the IP is not valid inside the container.
> > > > > >
> > > > > > 2) As mentioned in MESOS-3740
> > > > > > , some
> user
> > wants
> > > to
> > > > > run
> > > > > > a Mesos framework in a Mesos container. The old style
> framework
> > > driver
> > > > > > assumes a 2 way communication channel between the framework
> > and the
> > > > Mesos
> > > > > > master. In order for the master to reach the framework
> running
> > > inside a
> > > > > > Mesos container, the framework's libprocess should advertise
> > its ip
> > > and
> > > > > > port properly. This problem gets tricky because the
> networking
> > for
> > > the
> > > > > > Me

Re: LIBPROCSES_IP

2016-10-12 Thread Jie Yu
Stephan,

I think the only time the framework needs to set LIBPROCESS_ADVERTISE_IP is
when DNAT is necessary for the container (e.g., bridge). In that
case, LIBPROCESS_ADVERTISE_IP should always be agent ip and
the relevant host port allocated for the container. For other cases,
framework should not do anything.

- Jie

On Wed, Oct 12, 2016 at 4:43 AM, Erb, Stephan 
wrote:

> >Framework should be the one that sets
> >LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT appropriately if it
> >tries to launch another Mesos framework so that Master can reach the new
> >framework.
>
> As a framework/executor author this is not possible in all scenarios:
> There is no way to discover IP addresses assigned via CNI before the first
> StatusUpdate has been received. It is therefore not possible to set
> LIBPROCESS_ADVERTISE_IP appropriately at launch time.
>
> Please see https://issues.apache.org/jira/browse/MESOS-6281 for details.
>
>
> On 12/10/16 06:42, "Avinash Sridharan"  wrote:
>
> Valid point. Makes sense to drive this decision from the user and the
> framework.
>
> On Tue, Oct 11, 2016 at 9:32 PM, Jie Yu  wrote:
>
> > >
> > > While I believe this particular logic of setting
> LIBPROCESS_ADVERTISE_IP
> > > to agent IP can be done in the agent (it could look at the port
> mapping
> > > as well)
> >
> >
> > What if there are multiple port mappings? How can the agent decide
> which
> > port to be used as  LIBPROCESS_ADVERTISE_PORT?
> >
> > On Tue, Oct 11, 2016 at 9:27 PM, Avinash Sridharan <
> avin...@mesosphere.io>
> > wrote:
> >
> > > Definitely a +1 for executor binding to 0.0.0.0, instead of doing a
> > > `gethostname` and `getaddrinfo`. But I am assuming this semantics
> would
> > > kick in only if LIBPROCESS_IP is not set, which should be the norm.
> > >
> > > +1 for LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT and
> the onus
> > > being on the frameworks to set these variables. I guess the
> framework can
> > > set the LIBPROCESS_ADVERTISE_IP to the agent IP and
> > > LIBPROCESS_ADVERTISE_PORT to the host port when it specifies a
> > > port-mapping. While I believe this particular logic of
> > > setting LIBPROCESS_ADVERTISE_IP to agent IP can be done in the
> agent (it
> > > could look at the port mapping as well), when to actually set these
> > > variables (whether the executors even need to advertise their IP
> > addresses,
> > > is a decision that the Frameworks should be privy too and not left
> to the
> > > agent.
> > >
> > > On Tue, Oct 11, 2016 at 7:31 PM, haosdent 
> wrote:
> > >
> > > > > libprocess should always bind to 0.0.0.0
> > > > + 1 for this
> > > >
> > > > On Wed, Oct 12, 2016 at 2:33 AM, Jie Yu 
> wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > I was in the process of cleaning up some tech debt related to
> env
> > > > variables
> > > > > in our code base. I created an epic ticket
> > > > >  to track. I
> > > searched
> > > > > relevant tickets fired previously, and found MESOS-3740
> > > > > . I did some
> > digging
> > > > on
> > > > > how we handle LIBPROCESS_IP currently, and here are my
> findings:
> > > > >
> > > > > 1) We always set LIBPROCESS_IP in the executor environment
> variables:
> > > > > https://github.com/apache/mesos/blob/master/src/slave/
> > > > > slave.cpp#L6793-L6796
> > > > >
> > > > > This is not an issue for an executor that runs on host network.
> > > However,
> > > > if
> > > > > the executor wants to run on non-host network (e.g., overlay),
> this
> > > might
> > > > > be problematic, because libprocess for the executor will try
> to bind
> > to
> > > > > LIBPROCESS_IP, but the IP is not valid inside the container.
> > > > >
> > > > > 2) As mentioned in MESOS-3740
> > > > > , some user
> wants
> > to
> > > > run
> > > > > a Mesos framework in a Mesos container. The old style framework
> > driver
> > > > > assumes a 2 way communication channel between the framework
> and the
> > > Mesos
> > > > > master. In order for the master to reach the framework running
> > inside a
> > > > > Mesos container, the framework's libprocess should advertise
> its ip
> > and
> > > > > port properly. This problem gets tricky because the networking
> for
> > the
> > > > > Mesos container:
> > > > >
> > > > > 2.a) If the container uses host network, libprocess should
> bind to
> > > > 0.0.0.0,
> > > > > and advertise itself using the agent ip and the relevant port
> > > > > 2.b) If the container has a routable ip (e.g., using calico or
> > > overlay),
> > > > > libprocess shoul

Re: LIBPROCSES_IP

2016-10-12 Thread Alex Rukletsov
>
> Also, I think libprocess should always bind to 0.0.0.0, rather than doing a
> hostname lookup and bind to the IP found for the hostname.
> LIBPROCESS_ADVERTISE_IP can be used to overwrite the ip address it wants to
> advertise to peers. If that's not specified, it'll try to do a hostname
> lookup to guess a routable ip.
>

I'm +1 for this change. Here is one more argument.

A master or agents always have a single unique UPID, which is tied to a
specific IP, obtained either via a hostname lookup or set up manually.
However, the way IP is obtained influences the way a master or agents binds
to network interfaces: a single one in case LIBPROCESS_IP is set and *all*
available interfaces otherwise. This leads to confusions like sometimes you
can use any interface on the master machine to query a master endpoint, but
sometimes not (e.g. if you set --ip master flag), while agents always
communicate using one specific interface.

Some related links to the code:
https://github.com/apache/mesos/blob/c9b707aa86d55714ec419ad10190db22ec38108b/3rdparty/libprocess/src/process.cpp#L976
https://github.com/apache/mesos/blob/c9b707aa86d55714ec419ad10190db22ec38108b/3rdparty/libprocess/src/process.cpp#L899
https://github.com/apache/mesos/blob/c9b707aa86d55714ec419ad10190db22ec38108b/src/master/main.cpp#L233
https://github.com/apache/mesos/blob/c9b707aa86d55714ec419ad10190db22ec38108b/3rdparty/libprocess/src/process.cpp#L3282


Re: LIBPROCSES_IP

2016-10-12 Thread Erb, Stephan
>Framework should be the one that sets
>LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT appropriately if it
>tries to launch another Mesos framework so that Master can reach the new
>framework.

As a framework/executor author this is not possible in all scenarios: There is 
no way to discover IP addresses assigned via CNI before the first StatusUpdate 
has been received. It is therefore not possible to set LIBPROCESS_ADVERTISE_IP 
appropriately at launch time. 

Please see https://issues.apache.org/jira/browse/MESOS-6281 for details.


On 12/10/16 06:42, "Avinash Sridharan"  wrote:

Valid point. Makes sense to drive this decision from the user and the
framework.

On Tue, Oct 11, 2016 at 9:32 PM, Jie Yu  wrote:

> >
> > While I believe this particular logic of setting LIBPROCESS_ADVERTISE_IP
> > to agent IP can be done in the agent (it could look at the port mapping
> > as well)
>
>
> What if there are multiple port mappings? How can the agent decide which
> port to be used as  LIBPROCESS_ADVERTISE_PORT?
>
> On Tue, Oct 11, 2016 at 9:27 PM, Avinash Sridharan 
> wrote:
>
> > Definitely a +1 for executor binding to 0.0.0.0, instead of doing a
> > `gethostname` and `getaddrinfo`. But I am assuming this semantics would
> > kick in only if LIBPROCESS_IP is not set, which should be the norm.
> >
> > +1 for LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT and the 
onus
> > being on the frameworks to set these variables. I guess the framework 
can
> > set the LIBPROCESS_ADVERTISE_IP to the agent IP and
> > LIBPROCESS_ADVERTISE_PORT to the host port when it specifies a
> > port-mapping. While I believe this particular logic of
> > setting LIBPROCESS_ADVERTISE_IP to agent IP can be done in the agent (it
> > could look at the port mapping as well), when to actually set these
> > variables (whether the executors even need to advertise their IP
> addresses,
> > is a decision that the Frameworks should be privy too and not left to 
the
> > agent.
> >
> > On Tue, Oct 11, 2016 at 7:31 PM, haosdent  wrote:
> >
> > > > libprocess should always bind to 0.0.0.0
> > > + 1 for this
> > >
> > > On Wed, Oct 12, 2016 at 2:33 AM, Jie Yu  wrote:
> > >
> > > > Hi folks,
> > > >
> > > > I was in the process of cleaning up some tech debt related to env
> > > variables
> > > > in our code base. I created an epic ticket
> > > >  to track. I
> > searched
> > > > relevant tickets fired previously, and found MESOS-3740
> > > > . I did some
> digging
> > > on
> > > > how we handle LIBPROCESS_IP currently, and here are my findings:
> > > >
> > > > 1) We always set LIBPROCESS_IP in the executor environment 
variables:
> > > > https://github.com/apache/mesos/blob/master/src/slave/
> > > > slave.cpp#L6793-L6796
> > > >
> > > > This is not an issue for an executor that runs on host network.
> > However,
> > > if
> > > > the executor wants to run on non-host network (e.g., overlay), this
> > might
> > > > be problematic, because libprocess for the executor will try to bind
> to
> > > > LIBPROCESS_IP, but the IP is not valid inside the container.
> > > >
> > > > 2) As mentioned in MESOS-3740
> > > > , some user wants
> to
> > > run
> > > > a Mesos framework in a Mesos container. The old style framework
> driver
> > > > assumes a 2 way communication channel between the framework and the
> > Mesos
> > > > master. In order for the master to reach the framework running
> inside a
> > > > Mesos container, the framework's libprocess should advertise its ip
> and
> > > > port properly. This problem gets tricky because the networking for
> the
> > > > Mesos container:
> > > >
> > > > 2.a) If the container uses host network, libprocess should bind to
> > > 0.0.0.0,
> > > > and advertise itself using the agent ip and the relevant port
> > > > 2.b) If the container has a routable ip (e.g., using calico or
> > overlay),
> > > > libprocess should still bind to 0.0.0.0, and advertise itself using
> the
> > > > container ip and the relevant port. Currently, it binds to agent ip
> > > (which
> > > > will fail), and advertise itself using agnet ip and the port in the
> > > > container (which will fail as well)
> > > > 2.c) If the container has a private ip (e.g., bridge), libprocess
> > should
> > > > still bind to 0.0.0.0, and advertise itself using the agent ip and
> > > _mapped_
> > > > host port. Currently, it binds to agent ip (which will fail), and
> > > advertise
> > > > itself using agent ip and the port in 

Re: LIBPROCSES_IP

2016-10-11 Thread Avinash Sridharan
Valid point. Makes sense to drive this decision from the user and the
framework.

On Tue, Oct 11, 2016 at 9:32 PM, Jie Yu  wrote:

> >
> > While I believe this particular logic of setting LIBPROCESS_ADVERTISE_IP
> > to agent IP can be done in the agent (it could look at the port mapping
> > as well)
>
>
> What if there are multiple port mappings? How can the agent decide which
> port to be used as  LIBPROCESS_ADVERTISE_PORT?
>
> On Tue, Oct 11, 2016 at 9:27 PM, Avinash Sridharan 
> wrote:
>
> > Definitely a +1 for executor binding to 0.0.0.0, instead of doing a
> > `gethostname` and `getaddrinfo`. But I am assuming this semantics would
> > kick in only if LIBPROCESS_IP is not set, which should be the norm.
> >
> > +1 for LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT and the onus
> > being on the frameworks to set these variables. I guess the framework can
> > set the LIBPROCESS_ADVERTISE_IP to the agent IP and
> > LIBPROCESS_ADVERTISE_PORT to the host port when it specifies a
> > port-mapping. While I believe this particular logic of
> > setting LIBPROCESS_ADVERTISE_IP to agent IP can be done in the agent (it
> > could look at the port mapping as well), when to actually set these
> > variables (whether the executors even need to advertise their IP
> addresses,
> > is a decision that the Frameworks should be privy too and not left to the
> > agent.
> >
> > On Tue, Oct 11, 2016 at 7:31 PM, haosdent  wrote:
> >
> > > > libprocess should always bind to 0.0.0.0
> > > + 1 for this
> > >
> > > On Wed, Oct 12, 2016 at 2:33 AM, Jie Yu  wrote:
> > >
> > > > Hi folks,
> > > >
> > > > I was in the process of cleaning up some tech debt related to env
> > > variables
> > > > in our code base. I created an epic ticket
> > > >  to track. I
> > searched
> > > > relevant tickets fired previously, and found MESOS-3740
> > > > . I did some
> digging
> > > on
> > > > how we handle LIBPROCESS_IP currently, and here are my findings:
> > > >
> > > > 1) We always set LIBPROCESS_IP in the executor environment variables:
> > > > https://github.com/apache/mesos/blob/master/src/slave/
> > > > slave.cpp#L6793-L6796
> > > >
> > > > This is not an issue for an executor that runs on host network.
> > However,
> > > if
> > > > the executor wants to run on non-host network (e.g., overlay), this
> > might
> > > > be problematic, because libprocess for the executor will try to bind
> to
> > > > LIBPROCESS_IP, but the IP is not valid inside the container.
> > > >
> > > > 2) As mentioned in MESOS-3740
> > > > , some user wants
> to
> > > run
> > > > a Mesos framework in a Mesos container. The old style framework
> driver
> > > > assumes a 2 way communication channel between the framework and the
> > Mesos
> > > > master. In order for the master to reach the framework running
> inside a
> > > > Mesos container, the framework's libprocess should advertise its ip
> and
> > > > port properly. This problem gets tricky because the networking for
> the
> > > > Mesos container:
> > > >
> > > > 2.a) If the container uses host network, libprocess should bind to
> > > 0.0.0.0,
> > > > and advertise itself using the agent ip and the relevant port
> > > > 2.b) If the container has a routable ip (e.g., using calico or
> > overlay),
> > > > libprocess should still bind to 0.0.0.0, and advertise itself using
> the
> > > > container ip and the relevant port. Currently, it binds to agent ip
> > > (which
> > > > will fail), and advertise itself using agnet ip and the port in the
> > > > container (which will fail as well)
> > > > 2.c) If the container has a private ip (e.g., bridge), libprocess
> > should
> > > > still bind to 0.0.0.0, and advertise itself using the agent ip and
> > > _mapped_
> > > > host port. Currently, it binds to agent ip (which will fail), and
> > > advertise
> > > > itself using agent ip and the port in the container (which will fail
> as
> > > > well)
> > > >
> > > > Therefore, the workaround
> > > >  > > b9c622b53b3ffcc27911fcdcefc37a
> > > > 52ebe33bdd>
> > > > suggested in MESOS-3740  > > jira/browse/MESOS-3740>
> > > > is not ideal. It does not consider 2.b) and 2.c)
> > > >
> > > > Libprocess now supports both LIBPROCESS_IP and
> LIBPROCESS_ADVERTISE_IP
> > so
> > > > the bind address does not have to be the address that is being
> > > advertised.
> > > >
> > > > For the 2.c) case, Mesos don't have a way to determine the advertise
> > port
> > > > (mapped port). This information is only known to the framework (which
> > > host
> > > > port it'll use to serve as the mapped port for the libprocess).
> > > >
> > > > Given that, I think Mesos should not bindly set LIBPROCESS_IP to
> agent
> > IP
> > > > in executor environment variables. Framework should be the one that
> > sets
> > > > LIBPROCESS_ADVERTISE_I

Re: LIBPROCSES_IP

2016-10-11 Thread Jie Yu
>
> While I believe this particular logic of setting LIBPROCESS_ADVERTISE_IP
> to agent IP can be done in the agent (it could look at the port mapping
> as well)


What if there are multiple port mappings? How can the agent decide which
port to be used as  LIBPROCESS_ADVERTISE_PORT?

On Tue, Oct 11, 2016 at 9:27 PM, Avinash Sridharan 
wrote:

> Definitely a +1 for executor binding to 0.0.0.0, instead of doing a
> `gethostname` and `getaddrinfo`. But I am assuming this semantics would
> kick in only if LIBPROCESS_IP is not set, which should be the norm.
>
> +1 for LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT and the onus
> being on the frameworks to set these variables. I guess the framework can
> set the LIBPROCESS_ADVERTISE_IP to the agent IP and
> LIBPROCESS_ADVERTISE_PORT to the host port when it specifies a
> port-mapping. While I believe this particular logic of
> setting LIBPROCESS_ADVERTISE_IP to agent IP can be done in the agent (it
> could look at the port mapping as well), when to actually set these
> variables (whether the executors even need to advertise their IP addresses,
> is a decision that the Frameworks should be privy too and not left to the
> agent.
>
> On Tue, Oct 11, 2016 at 7:31 PM, haosdent  wrote:
>
> > > libprocess should always bind to 0.0.0.0
> > + 1 for this
> >
> > On Wed, Oct 12, 2016 at 2:33 AM, Jie Yu  wrote:
> >
> > > Hi folks,
> > >
> > > I was in the process of cleaning up some tech debt related to env
> > variables
> > > in our code base. I created an epic ticket
> > >  to track. I
> searched
> > > relevant tickets fired previously, and found MESOS-3740
> > > . I did some digging
> > on
> > > how we handle LIBPROCESS_IP currently, and here are my findings:
> > >
> > > 1) We always set LIBPROCESS_IP in the executor environment variables:
> > > https://github.com/apache/mesos/blob/master/src/slave/
> > > slave.cpp#L6793-L6796
> > >
> > > This is not an issue for an executor that runs on host network.
> However,
> > if
> > > the executor wants to run on non-host network (e.g., overlay), this
> might
> > > be problematic, because libprocess for the executor will try to bind to
> > > LIBPROCESS_IP, but the IP is not valid inside the container.
> > >
> > > 2) As mentioned in MESOS-3740
> > > , some user wants to
> > run
> > > a Mesos framework in a Mesos container. The old style framework driver
> > > assumes a 2 way communication channel between the framework and the
> Mesos
> > > master. In order for the master to reach the framework running inside a
> > > Mesos container, the framework's libprocess should advertise its ip and
> > > port properly. This problem gets tricky because the networking for the
> > > Mesos container:
> > >
> > > 2.a) If the container uses host network, libprocess should bind to
> > 0.0.0.0,
> > > and advertise itself using the agent ip and the relevant port
> > > 2.b) If the container has a routable ip (e.g., using calico or
> overlay),
> > > libprocess should still bind to 0.0.0.0, and advertise itself using the
> > > container ip and the relevant port. Currently, it binds to agent ip
> > (which
> > > will fail), and advertise itself using agnet ip and the port in the
> > > container (which will fail as well)
> > > 2.c) If the container has a private ip (e.g., bridge), libprocess
> should
> > > still bind to 0.0.0.0, and advertise itself using the agent ip and
> > _mapped_
> > > host port. Currently, it binds to agent ip (which will fail), and
> > advertise
> > > itself using agent ip and the port in the container (which will fail as
> > > well)
> > >
> > > Therefore, the workaround
> > >  > b9c622b53b3ffcc27911fcdcefc37a
> > > 52ebe33bdd>
> > > suggested in MESOS-3740  > jira/browse/MESOS-3740>
> > > is not ideal. It does not consider 2.b) and 2.c)
> > >
> > > Libprocess now supports both LIBPROCESS_IP and LIBPROCESS_ADVERTISE_IP
> so
> > > the bind address does not have to be the address that is being
> > advertised.
> > >
> > > For the 2.c) case, Mesos don't have a way to determine the advertise
> port
> > > (mapped port). This information is only known to the framework (which
> > host
> > > port it'll use to serve as the mapped port for the libprocess).
> > >
> > > Given that, I think Mesos should not bindly set LIBPROCESS_IP to agent
> IP
> > > in executor environment variables. Framework should be the one that
> sets
> > > LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT appropriately if
> it
> > > tries to launch another Mesos framework so that Master can reach the
> new
> > > framework. If the framework just wants to launch a regular container
> that
> > > does not depends on libprocess, it should simply not set these env
> > > variables.
> > >
> > > Also, I think libprocess should always bind to 0.0.0.0, 

Re: LIBPROCSES_IP

2016-10-11 Thread Avinash Sridharan
Definitely a +1 for executor binding to 0.0.0.0, instead of doing a
`gethostname` and `getaddrinfo`. But I am assuming this semantics would
kick in only if LIBPROCESS_IP is not set, which should be the norm.

+1 for LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT and the onus
being on the frameworks to set these variables. I guess the framework can
set the LIBPROCESS_ADVERTISE_IP to the agent IP and
LIBPROCESS_ADVERTISE_PORT to the host port when it specifies a
port-mapping. While I believe this particular logic of
setting LIBPROCESS_ADVERTISE_IP to agent IP can be done in the agent (it
could look at the port mapping as well), when to actually set these
variables (whether the executors even need to advertise their IP addresses,
is a decision that the Frameworks should be privy too and not left to the
agent.

On Tue, Oct 11, 2016 at 7:31 PM, haosdent  wrote:

> > libprocess should always bind to 0.0.0.0
> + 1 for this
>
> On Wed, Oct 12, 2016 at 2:33 AM, Jie Yu  wrote:
>
> > Hi folks,
> >
> > I was in the process of cleaning up some tech debt related to env
> variables
> > in our code base. I created an epic ticket
> >  to track. I searched
> > relevant tickets fired previously, and found MESOS-3740
> > . I did some digging
> on
> > how we handle LIBPROCESS_IP currently, and here are my findings:
> >
> > 1) We always set LIBPROCESS_IP in the executor environment variables:
> > https://github.com/apache/mesos/blob/master/src/slave/
> > slave.cpp#L6793-L6796
> >
> > This is not an issue for an executor that runs on host network. However,
> if
> > the executor wants to run on non-host network (e.g., overlay), this might
> > be problematic, because libprocess for the executor will try to bind to
> > LIBPROCESS_IP, but the IP is not valid inside the container.
> >
> > 2) As mentioned in MESOS-3740
> > , some user wants to
> run
> > a Mesos framework in a Mesos container. The old style framework driver
> > assumes a 2 way communication channel between the framework and the Mesos
> > master. In order for the master to reach the framework running inside a
> > Mesos container, the framework's libprocess should advertise its ip and
> > port properly. This problem gets tricky because the networking for the
> > Mesos container:
> >
> > 2.a) If the container uses host network, libprocess should bind to
> 0.0.0.0,
> > and advertise itself using the agent ip and the relevant port
> > 2.b) If the container has a routable ip (e.g., using calico or overlay),
> > libprocess should still bind to 0.0.0.0, and advertise itself using the
> > container ip and the relevant port. Currently, it binds to agent ip
> (which
> > will fail), and advertise itself using agnet ip and the port in the
> > container (which will fail as well)
> > 2.c) If the container has a private ip (e.g., bridge), libprocess should
> > still bind to 0.0.0.0, and advertise itself using the agent ip and
> _mapped_
> > host port. Currently, it binds to agent ip (which will fail), and
> advertise
> > itself using agent ip and the port in the container (which will fail as
> > well)
> >
> > Therefore, the workaround
> >  b9c622b53b3ffcc27911fcdcefc37a
> > 52ebe33bdd>
> > suggested in MESOS-3740  jira/browse/MESOS-3740>
> > is not ideal. It does not consider 2.b) and 2.c)
> >
> > Libprocess now supports both LIBPROCESS_IP and LIBPROCESS_ADVERTISE_IP so
> > the bind address does not have to be the address that is being
> advertised.
> >
> > For the 2.c) case, Mesos don't have a way to determine the advertise port
> > (mapped port). This information is only known to the framework (which
> host
> > port it'll use to serve as the mapped port for the libprocess).
> >
> > Given that, I think Mesos should not bindly set LIBPROCESS_IP to agent IP
> > in executor environment variables. Framework should be the one that sets
> > LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT appropriately if it
> > tries to launch another Mesos framework so that Master can reach the new
> > framework. If the framework just wants to launch a regular container that
> > does not depends on libprocess, it should simply not set these env
> > variables.
> >
> > Also, I think libprocess should always bind to 0.0.0.0, rather than
> doing a
> > hostname lookup and bind to the IP found for the hostname.
> > LIBPROCESS_ADVERTISE_IP can be used to overwrite the ip address it wants
> to
> > advertise to peers. If that's not specified, it'll try to do a hostname
> > lookup to guess a routable ip.
> >
> > Thoughts?
> > - Jie
> >
>
>
>
> --
> Best Regards,
> Haosdent Huang
>



-- 
Avinash Sridharan, Mesosphere
+1 (323) 702 5245


Re: LIBPROCSES_IP

2016-10-11 Thread haosdent
> libprocess should always bind to 0.0.0.0
+ 1 for this

On Wed, Oct 12, 2016 at 2:33 AM, Jie Yu  wrote:

> Hi folks,
>
> I was in the process of cleaning up some tech debt related to env variables
> in our code base. I created an epic ticket
>  to track. I searched
> relevant tickets fired previously, and found MESOS-3740
> . I did some digging on
> how we handle LIBPROCESS_IP currently, and here are my findings:
>
> 1) We always set LIBPROCESS_IP in the executor environment variables:
> https://github.com/apache/mesos/blob/master/src/slave/
> slave.cpp#L6793-L6796
>
> This is not an issue for an executor that runs on host network. However, if
> the executor wants to run on non-host network (e.g., overlay), this might
> be problematic, because libprocess for the executor will try to bind to
> LIBPROCESS_IP, but the IP is not valid inside the container.
>
> 2) As mentioned in MESOS-3740
> , some user wants to run
> a Mesos framework in a Mesos container. The old style framework driver
> assumes a 2 way communication channel between the framework and the Mesos
> master. In order for the master to reach the framework running inside a
> Mesos container, the framework's libprocess should advertise its ip and
> port properly. This problem gets tricky because the networking for the
> Mesos container:
>
> 2.a) If the container uses host network, libprocess should bind to 0.0.0.0,
> and advertise itself using the agent ip and the relevant port
> 2.b) If the container has a routable ip (e.g., using calico or overlay),
> libprocess should still bind to 0.0.0.0, and advertise itself using the
> container ip and the relevant port. Currently, it binds to agent ip (which
> will fail), and advertise itself using agnet ip and the port in the
> container (which will fail as well)
> 2.c) If the container has a private ip (e.g., bridge), libprocess should
> still bind to 0.0.0.0, and advertise itself using the agent ip and _mapped_
> host port. Currently, it binds to agent ip (which will fail), and advertise
> itself using agent ip and the port in the container (which will fail as
> well)
>
> Therefore, the workaround
>  52ebe33bdd>
> suggested in MESOS-3740 
> is not ideal. It does not consider 2.b) and 2.c)
>
> Libprocess now supports both LIBPROCESS_IP and LIBPROCESS_ADVERTISE_IP so
> the bind address does not have to be the address that is being advertised.
>
> For the 2.c) case, Mesos don't have a way to determine the advertise port
> (mapped port). This information is only known to the framework (which host
> port it'll use to serve as the mapped port for the libprocess).
>
> Given that, I think Mesos should not bindly set LIBPROCESS_IP to agent IP
> in executor environment variables. Framework should be the one that sets
> LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT appropriately if it
> tries to launch another Mesos framework so that Master can reach the new
> framework. If the framework just wants to launch a regular container that
> does not depends on libprocess, it should simply not set these env
> variables.
>
> Also, I think libprocess should always bind to 0.0.0.0, rather than doing a
> hostname lookup and bind to the IP found for the hostname.
> LIBPROCESS_ADVERTISE_IP can be used to overwrite the ip address it wants to
> advertise to peers. If that's not specified, it'll try to do a hostname
> lookup to guess a routable ip.
>
> Thoughts?
> - Jie
>



-- 
Best Regards,
Haosdent Huang


LIBPROCSES_IP

2016-10-11 Thread Jie Yu
Hi folks,

I was in the process of cleaning up some tech debt related to env variables
in our code base. I created an epic ticket
 to track. I searched
relevant tickets fired previously, and found MESOS-3740
. I did some digging on
how we handle LIBPROCESS_IP currently, and here are my findings:

1) We always set LIBPROCESS_IP in the executor environment variables:
https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L6793-L6796

This is not an issue for an executor that runs on host network. However, if
the executor wants to run on non-host network (e.g., overlay), this might
be problematic, because libprocess for the executor will try to bind to
LIBPROCESS_IP, but the IP is not valid inside the container.

2) As mentioned in MESOS-3740
, some user wants to run
a Mesos framework in a Mesos container. The old style framework driver
assumes a 2 way communication channel between the framework and the Mesos
master. In order for the master to reach the framework running inside a
Mesos container, the framework's libprocess should advertise its ip and
port properly. This problem gets tricky because the networking for the
Mesos container:

2.a) If the container uses host network, libprocess should bind to 0.0.0.0,
and advertise itself using the agent ip and the relevant port
2.b) If the container has a routable ip (e.g., using calico or overlay),
libprocess should still bind to 0.0.0.0, and advertise itself using the
container ip and the relevant port. Currently, it binds to agent ip (which
will fail), and advertise itself using agnet ip and the port in the
container (which will fail as well)
2.c) If the container has a private ip (e.g., bridge), libprocess should
still bind to 0.0.0.0, and advertise itself using the agent ip and _mapped_
host port. Currently, it binds to agent ip (which will fail), and advertise
itself using agent ip and the port in the container (which will fail as
well)

Therefore, the workaround

suggested in MESOS-3740 
is not ideal. It does not consider 2.b) and 2.c)

Libprocess now supports both LIBPROCESS_IP and LIBPROCESS_ADVERTISE_IP so
the bind address does not have to be the address that is being advertised.

For the 2.c) case, Mesos don't have a way to determine the advertise port
(mapped port). This information is only known to the framework (which host
port it'll use to serve as the mapped port for the libprocess).

Given that, I think Mesos should not bindly set LIBPROCESS_IP to agent IP
in executor environment variables. Framework should be the one that sets
LIBPROCESS_ADVERTISE_IP and LIBPROCESS_ADVERTISE_PORT appropriately if it
tries to launch another Mesos framework so that Master can reach the new
framework. If the framework just wants to launch a regular container that
does not depends on libprocess, it should simply not set these env
variables.

Also, I think libprocess should always bind to 0.0.0.0, rather than doing a
hostname lookup and bind to the IP found for the hostname.
LIBPROCESS_ADVERTISE_IP can be used to overwrite the ip address it wants to
advertise to peers. If that's not specified, it'll try to do a hostname
lookup to guess a routable ip.

Thoughts?
- Jie