[jira] [Updated] (SPARK-5113) Audit and document use of hostnames and IP addresses in Spark

2015-02-08 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5113:
-
Component/s: Spark Core

 Audit and document use of hostnames and IP addresses in Spark
 -

 Key: SPARK-5113
 URL: https://issues.apache.org/jira/browse/SPARK-5113
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Patrick Wendell
Priority: Critical

 Spark has multiple network components that start servers and advertise their 
 network addresses to other processes.
 We should go through each of these components and make sure they have 
 consistent and/or documented behavior wrt (a) what interface(s) they bind to 
 and (b) what hostname they use to advertise themselves to other processes. We 
 should document this clearly and explain to people what to do in different 
 cases (e.g. EC2, dockerized containers, etc).
 When Spark initializes, it will search for a network interface until it finds 
 one that is not a loopback address. Then it will do a reverse DNS lookup for 
 a hostname associated with that interface. Then the network components will 
 use that hostname to advertise the component to other processes. That 
 hostname is also the one used for the akka system identifier (akka supports 
 only supplying a single name which it uses both as the bind interface and as 
 the actor identifier). In some cases, that hostname is used as the bind 
 hostname also (e.g. I think this happens in the connection manager and 
 possibly akka) - which will likely internally result in a re-resolution of 
 this to an IP address. In other cases (the web UI and netty shuffle) we seem 
 to bind to all interfaces.
 The best outcome would be to have three configs that can be set on each 
 machine:
 {code}
 SPARK_LOCAL_IP # Ip address we bind to for all services
 SPARK_INTERNAL_HOSTNAME # Hostname we advertise to remote processes within 
 the cluster
 SPARK_EXTERNAL_HOSTNAME # Hostname we advertise to processes outside the 
 cluster (e.g. the UI)
 {code}
 It's not clear how easily we can support that scheme while providing 
 backwards compatibility. The last one (SPARK_EXTERNAL_HOSTNAME) is easy - 
 it's just an alias for what is now SPARK_PUBLIC_DNS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5113) Audit and document use of hostnames and IP addresses in Spark

2015-01-06 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5113:
---
Description: 
Spark has multiple network components that start servers and advertise their 
network addresses to other processes.

We should go through each of these components and make sure they have 
consistent and/or documented behavior wrt (a) what interface(s) they bind to 
and (b) what hostname they use to advertise themselves to other processes. We 
should document this clearly and explain to people what to do in different 
cases (e.g. EC2, dockerized containers, etc).

When Spark initializes, it will search for a network interface until it finds 
one that is not a loopback address. Then it will do a reverse DNS lookup for a 
hostname associated with that interface. Then the network components will use 
that hostname to advertise the component to other processes. That hostname is 
also the one used for the akka system identifier (akka supports only supplying 
a single name which it uses both as the bind interface and as the actor 
identifier). In some cases, that hostname is used as the bind hostname also 
(e.g. I think this happens in the connection manager and possibly akka) - which 
will likely internally result in a re-resolution of this to an IP address. In 
other cases (the web UI and netty shuffle) we seem to bind to all interfaces.

The best outcome would be to have three configs that can be set on each machine:

{code}
SPARK_LOCAL_IP # Ip address we bind to for all services
SPARK_INTERNAL_HOSTNAME # Hostname we advertise to remote processes within the 
cluster
SPARK_EXTERNAL_HOSTNAME # Hostname we advertise to processes outside the 
cluster (e.g. the UI)
{code}

It's not clear how easily we can support that scheme while providing backwards 
compatibility. The last one (SPARK_EXTERNAL_HOSTNAME) is easy - it's just an 
alias for what is now SPARK_PUBLIC_DNS.

  was:
Spark has multiple network components that start servers and advertise their 
network addresses to other processes.

We should go through each of these components and make sure they have 
consistent and/or documented behavior wrt (a) what interface(s) they bind to 
and (b) what hostname they use to advertise themselves to other processes. We 
should document this clearly and explain to people what to do in different 
cases (e.g. EC2, dockerized containers, etc).

When Spark initializes, it will search for a network interface until it finds 
one that is not a loopback address. Then it will do a reverse DNS lookup for a 
hostname associated with that interface. Then the network components will use 
that hostname to advertise the component to other processes. That hostname is 
also the one used for the akka system identifier (akka supports only supplying 
a single name which it uses both as the bind interface and as the actor 
identifier). In some cases, that hostname is used as the bind hostname also 
(e.g. I think this happens in the connection manager and possibly akka) - which 
will likely internally result in a re-resolution of this to an IP address. In 
other cases (the web UI and netty shuffle) we seem to bind to all interfaces.


 Audit and document use of hostnames and IP addresses in Spark
 -

 Key: SPARK-5113
 URL: https://issues.apache.org/jira/browse/SPARK-5113
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell
Priority: Critical

 Spark has multiple network components that start servers and advertise their 
 network addresses to other processes.
 We should go through each of these components and make sure they have 
 consistent and/or documented behavior wrt (a) what interface(s) they bind to 
 and (b) what hostname they use to advertise themselves to other processes. We 
 should document this clearly and explain to people what to do in different 
 cases (e.g. EC2, dockerized containers, etc).
 When Spark initializes, it will search for a network interface until it finds 
 one that is not a loopback address. Then it will do a reverse DNS lookup for 
 a hostname associated with that interface. Then the network components will 
 use that hostname to advertise the component to other processes. That 
 hostname is also the one used for the akka system identifier (akka supports 
 only supplying a single name which it uses both as the bind interface and as 
 the actor identifier). In some cases, that hostname is used as the bind 
 hostname also (e.g. I think this happens in the connection manager and 
 possibly akka) - which will likely internally result in a re-resolution of 
 this to an IP address. In other cases (the web UI and netty shuffle) we seem 
 to bind to all interfaces.
 The best outcome would be to have three configs that can be set on each 
 machine:
 {code}
 

[jira] [Updated] (SPARK-5113) Audit and document use of hostnames and IP addresses in Spark

2015-01-06 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5113:
---
Description: 
Spark has multiple network components that start servers and advertise their 
network addresses to other processes.

We should go through each of these components and make sure they have 
consistent and/or documented behavior wrt (a) what interface(s) they bind to 
and (b) what hostname they use to advertise themselves to other processes. We 
should document this clearly and explain to people what to do in different 
cases (e.g. EC2, dockerized containers, etc).

When Spark initializes, it will search for a network interface until it finds 
one that is not a loopback address. Then it will do a reverse DNS lookup for a 
hostname associated with that interface. Then the network components will use 
that hostname to advertise the component to other processes. In some cases, 
that hostname is used as the bind hostname also (e.g. I think this happens in 
the connection manager and possibly akka) - which will likely internally result 
in a re-resolution of this to an IP address. In other cases (the web UI and 
netty shuffle) we seem to bind to all interfaces.

  was:
Spark has multiple network components that start servers and advertise their 
network addresses to other processes.

We should go through each of these components and make sure they have 
consistent and/or documented behavior wrt (a) what interface(s) they bind to 
and (b) what hostname they use to advertise themselves to other processes. We 
should document this clearly and explain to people what to do in different 
cases (e.g. EC2, dockerized containers, etc).

When Spark initializes, it will search for a network interface until it finds 
one that is not a loopback address. Then it will do a reverse DNS lookup for a 
hostname associated with that interface. Then the network components will use 
that hostname to advertise the component to other processes. In some cases, 
that hostname is used as the bind interface also (e.g. I think this happens in 
the connection manager and possibly akka). In other cases (the web UI and netty 
shuffle) we seem to bind to all interfaces.


 Audit and document use of hostnames and IP addresses in Spark
 -

 Key: SPARK-5113
 URL: https://issues.apache.org/jira/browse/SPARK-5113
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell
Priority: Critical

 Spark has multiple network components that start servers and advertise their 
 network addresses to other processes.
 We should go through each of these components and make sure they have 
 consistent and/or documented behavior wrt (a) what interface(s) they bind to 
 and (b) what hostname they use to advertise themselves to other processes. We 
 should document this clearly and explain to people what to do in different 
 cases (e.g. EC2, dockerized containers, etc).
 When Spark initializes, it will search for a network interface until it finds 
 one that is not a loopback address. Then it will do a reverse DNS lookup for 
 a hostname associated with that interface. Then the network components will 
 use that hostname to advertise the component to other processes. In some 
 cases, that hostname is used as the bind hostname also (e.g. I think this 
 happens in the connection manager and possibly akka) - which will likely 
 internally result in a re-resolution of this to an IP address. In other cases 
 (the web UI and netty shuffle) we seem to bind to all interfaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5113) Audit and document use of hostnames and IP addresses in Spark

2015-01-06 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5113:
---
Description: 
Spark has multiple network components that start servers and advertise their 
network addresses to other processes.

We should go through each of these components and make sure they have 
consistent and/or documented behavior wrt (a) what interface(s) they bind to 
and (b) what hostname they use to advertise themselves to other processes. We 
should document this clearly and explain to people what to do in different 
cases (e.g. EC2, dockerized containers, etc).

When Spark initializes, it will search for a network interface until it finds 
one that is not a loopback address. Then it will do a reverse DNS lookup for a 
hostname associated with that interface. Then the network components will use 
that hostname to advertise the component to other processes. That hostname is 
also the one used for the akka system identifier (akka supports only supplying 
a single name which it uses both as the bind interface and as the actor 
identifier). In some cases, that hostname is used as the bind hostname also 
(e.g. I think this happens in the connection manager and possibly akka) - which 
will likely internally result in a re-resolution of this to an IP address. In 
other cases (the web UI and netty shuffle) we seem to bind to all interfaces.

  was:
Spark has multiple network components that start servers and advertise their 
network addresses to other processes.

We should go through each of these components and make sure they have 
consistent and/or documented behavior wrt (a) what interface(s) they bind to 
and (b) what hostname they use to advertise themselves to other processes. We 
should document this clearly and explain to people what to do in different 
cases (e.g. EC2, dockerized containers, etc).

When Spark initializes, it will search for a network interface until it finds 
one that is not a loopback address. Then it will do a reverse DNS lookup for a 
hostname associated with that interface. Then the network components will use 
that hostname to advertise the component to other processes. That hostname is 
also the one used for the akka system identifier. In some cases, that hostname 
is used as the bind hostname also (e.g. I think this happens in the connection 
manager and possibly akka) - which will likely internally result in a 
re-resolution of this to an IP address. In other cases (the web UI and netty 
shuffle) we seem to bind to all interfaces.


 Audit and document use of hostnames and IP addresses in Spark
 -

 Key: SPARK-5113
 URL: https://issues.apache.org/jira/browse/SPARK-5113
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell
Priority: Critical

 Spark has multiple network components that start servers and advertise their 
 network addresses to other processes.
 We should go through each of these components and make sure they have 
 consistent and/or documented behavior wrt (a) what interface(s) they bind to 
 and (b) what hostname they use to advertise themselves to other processes. We 
 should document this clearly and explain to people what to do in different 
 cases (e.g. EC2, dockerized containers, etc).
 When Spark initializes, it will search for a network interface until it finds 
 one that is not a loopback address. Then it will do a reverse DNS lookup for 
 a hostname associated with that interface. Then the network components will 
 use that hostname to advertise the component to other processes. That 
 hostname is also the one used for the akka system identifier (akka supports 
 only supplying a single name which it uses both as the bind interface and as 
 the actor identifier). In some cases, that hostname is used as the bind 
 hostname also (e.g. I think this happens in the connection manager and 
 possibly akka) - which will likely internally result in a re-resolution of 
 this to an IP address. In other cases (the web UI and netty shuffle) we seem 
 to bind to all interfaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5113) Audit and document use of hostnames and IP addresses in Spark

2015-01-06 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5113:
---
Description: 
Spark has multiple network components that start servers and advertise their 
network addresses to other processes.

We should go through each of these components and make sure they have 
consistent and/or documented behavior wrt (a) what interface(s) they bind to 
and (b) what hostname they use to advertise themselves to other processes. We 
should document this clearly and explain to people what to do in different 
cases (e.g. EC2, dockerized containers, etc).

When Spark initializes, it will search for a network interface until it finds 
one that is not a loopback address. Then it will do a reverse DNS lookup for a 
hostname associated with that interface. Then the network components will use 
that hostname to advertise the component to other processes. That hostname is 
also the one used for the akka system identifier. In some cases, that hostname 
is used as the bind hostname also (e.g. I think this happens in the connection 
manager and possibly akka) - which will likely internally result in a 
re-resolution of this to an IP address. In other cases (the web UI and netty 
shuffle) we seem to bind to all interfaces.

  was:
Spark has multiple network components that start servers and advertise their 
network addresses to other processes.

We should go through each of these components and make sure they have 
consistent and/or documented behavior wrt (a) what interface(s) they bind to 
and (b) what hostname they use to advertise themselves to other processes. We 
should document this clearly and explain to people what to do in different 
cases (e.g. EC2, dockerized containers, etc).

When Spark initializes, it will search for a network interface until it finds 
one that is not a loopback address. Then it will do a reverse DNS lookup for a 
hostname associated with that interface. Then the network components will use 
that hostname to advertise the component to other processes. In some cases, 
that hostname is used as the bind hostname also (e.g. I think this happens in 
the connection manager and possibly akka) - which will likely internally result 
in a re-resolution of this to an IP address. In other cases (the web UI and 
netty shuffle) we seem to bind to all interfaces.


 Audit and document use of hostnames and IP addresses in Spark
 -

 Key: SPARK-5113
 URL: https://issues.apache.org/jira/browse/SPARK-5113
 Project: Spark
  Issue Type: Bug
Reporter: Patrick Wendell
Priority: Critical

 Spark has multiple network components that start servers and advertise their 
 network addresses to other processes.
 We should go through each of these components and make sure they have 
 consistent and/or documented behavior wrt (a) what interface(s) they bind to 
 and (b) what hostname they use to advertise themselves to other processes. We 
 should document this clearly and explain to people what to do in different 
 cases (e.g. EC2, dockerized containers, etc).
 When Spark initializes, it will search for a network interface until it finds 
 one that is not a loopback address. Then it will do a reverse DNS lookup for 
 a hostname associated with that interface. Then the network components will 
 use that hostname to advertise the component to other processes. That 
 hostname is also the one used for the akka system identifier. In some cases, 
 that hostname is used as the bind hostname also (e.g. I think this happens in 
 the connection manager and possibly akka) - which will likely internally 
 result in a re-resolution of this to an IP address. In other cases (the web 
 UI and netty shuffle) we seem to bind to all interfaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org