[jira] [Updated] (SPARK-5113) Audit and document use of hostnames and IP addresses in Spark
[ https://issues.apache.org/jira/browse/SPARK-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-5113: - Component/s: Spark Core Audit and document use of hostnames and IP addresses in Spark - Key: SPARK-5113 URL: https://issues.apache.org/jira/browse/SPARK-5113 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Patrick Wendell Priority: Critical Spark has multiple network components that start servers and advertise their network addresses to other processes. We should go through each of these components and make sure they have consistent and/or documented behavior wrt (a) what interface(s) they bind to and (b) what hostname they use to advertise themselves to other processes. We should document this clearly and explain to people what to do in different cases (e.g. EC2, dockerized containers, etc). When Spark initializes, it will search for a network interface until it finds one that is not a loopback address. Then it will do a reverse DNS lookup for a hostname associated with that interface. Then the network components will use that hostname to advertise the component to other processes. That hostname is also the one used for the akka system identifier (akka supports only supplying a single name which it uses both as the bind interface and as the actor identifier). In some cases, that hostname is used as the bind hostname also (e.g. I think this happens in the connection manager and possibly akka) - which will likely internally result in a re-resolution of this to an IP address. In other cases (the web UI and netty shuffle) we seem to bind to all interfaces. The best outcome would be to have three configs that can be set on each machine: {code} SPARK_LOCAL_IP # Ip address we bind to for all services SPARK_INTERNAL_HOSTNAME # Hostname we advertise to remote processes within the cluster SPARK_EXTERNAL_HOSTNAME # Hostname we advertise to processes outside the cluster (e.g. the UI) {code} It's not clear how easily we can support that scheme while providing backwards compatibility. The last one (SPARK_EXTERNAL_HOSTNAME) is easy - it's just an alias for what is now SPARK_PUBLIC_DNS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5113) Audit and document use of hostnames and IP addresses in Spark
[ https://issues.apache.org/jira/browse/SPARK-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5113: --- Description: Spark has multiple network components that start servers and advertise their network addresses to other processes. We should go through each of these components and make sure they have consistent and/or documented behavior wrt (a) what interface(s) they bind to and (b) what hostname they use to advertise themselves to other processes. We should document this clearly and explain to people what to do in different cases (e.g. EC2, dockerized containers, etc). When Spark initializes, it will search for a network interface until it finds one that is not a loopback address. Then it will do a reverse DNS lookup for a hostname associated with that interface. Then the network components will use that hostname to advertise the component to other processes. That hostname is also the one used for the akka system identifier (akka supports only supplying a single name which it uses both as the bind interface and as the actor identifier). In some cases, that hostname is used as the bind hostname also (e.g. I think this happens in the connection manager and possibly akka) - which will likely internally result in a re-resolution of this to an IP address. In other cases (the web UI and netty shuffle) we seem to bind to all interfaces. The best outcome would be to have three configs that can be set on each machine: {code} SPARK_LOCAL_IP # Ip address we bind to for all services SPARK_INTERNAL_HOSTNAME # Hostname we advertise to remote processes within the cluster SPARK_EXTERNAL_HOSTNAME # Hostname we advertise to processes outside the cluster (e.g. the UI) {code} It's not clear how easily we can support that scheme while providing backwards compatibility. The last one (SPARK_EXTERNAL_HOSTNAME) is easy - it's just an alias for what is now SPARK_PUBLIC_DNS. was: Spark has multiple network components that start servers and advertise their network addresses to other processes. We should go through each of these components and make sure they have consistent and/or documented behavior wrt (a) what interface(s) they bind to and (b) what hostname they use to advertise themselves to other processes. We should document this clearly and explain to people what to do in different cases (e.g. EC2, dockerized containers, etc). When Spark initializes, it will search for a network interface until it finds one that is not a loopback address. Then it will do a reverse DNS lookup for a hostname associated with that interface. Then the network components will use that hostname to advertise the component to other processes. That hostname is also the one used for the akka system identifier (akka supports only supplying a single name which it uses both as the bind interface and as the actor identifier). In some cases, that hostname is used as the bind hostname also (e.g. I think this happens in the connection manager and possibly akka) - which will likely internally result in a re-resolution of this to an IP address. In other cases (the web UI and netty shuffle) we seem to bind to all interfaces. Audit and document use of hostnames and IP addresses in Spark - Key: SPARK-5113 URL: https://issues.apache.org/jira/browse/SPARK-5113 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Priority: Critical Spark has multiple network components that start servers and advertise their network addresses to other processes. We should go through each of these components and make sure they have consistent and/or documented behavior wrt (a) what interface(s) they bind to and (b) what hostname they use to advertise themselves to other processes. We should document this clearly and explain to people what to do in different cases (e.g. EC2, dockerized containers, etc). When Spark initializes, it will search for a network interface until it finds one that is not a loopback address. Then it will do a reverse DNS lookup for a hostname associated with that interface. Then the network components will use that hostname to advertise the component to other processes. That hostname is also the one used for the akka system identifier (akka supports only supplying a single name which it uses both as the bind interface and as the actor identifier). In some cases, that hostname is used as the bind hostname also (e.g. I think this happens in the connection manager and possibly akka) - which will likely internally result in a re-resolution of this to an IP address. In other cases (the web UI and netty shuffle) we seem to bind to all interfaces. The best outcome would be to have three configs that can be set on each machine: {code}
[jira] [Updated] (SPARK-5113) Audit and document use of hostnames and IP addresses in Spark
[ https://issues.apache.org/jira/browse/SPARK-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5113: --- Description: Spark has multiple network components that start servers and advertise their network addresses to other processes. We should go through each of these components and make sure they have consistent and/or documented behavior wrt (a) what interface(s) they bind to and (b) what hostname they use to advertise themselves to other processes. We should document this clearly and explain to people what to do in different cases (e.g. EC2, dockerized containers, etc). When Spark initializes, it will search for a network interface until it finds one that is not a loopback address. Then it will do a reverse DNS lookup for a hostname associated with that interface. Then the network components will use that hostname to advertise the component to other processes. In some cases, that hostname is used as the bind hostname also (e.g. I think this happens in the connection manager and possibly akka) - which will likely internally result in a re-resolution of this to an IP address. In other cases (the web UI and netty shuffle) we seem to bind to all interfaces. was: Spark has multiple network components that start servers and advertise their network addresses to other processes. We should go through each of these components and make sure they have consistent and/or documented behavior wrt (a) what interface(s) they bind to and (b) what hostname they use to advertise themselves to other processes. We should document this clearly and explain to people what to do in different cases (e.g. EC2, dockerized containers, etc). When Spark initializes, it will search for a network interface until it finds one that is not a loopback address. Then it will do a reverse DNS lookup for a hostname associated with that interface. Then the network components will use that hostname to advertise the component to other processes. In some cases, that hostname is used as the bind interface also (e.g. I think this happens in the connection manager and possibly akka). In other cases (the web UI and netty shuffle) we seem to bind to all interfaces. Audit and document use of hostnames and IP addresses in Spark - Key: SPARK-5113 URL: https://issues.apache.org/jira/browse/SPARK-5113 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Priority: Critical Spark has multiple network components that start servers and advertise their network addresses to other processes. We should go through each of these components and make sure they have consistent and/or documented behavior wrt (a) what interface(s) they bind to and (b) what hostname they use to advertise themselves to other processes. We should document this clearly and explain to people what to do in different cases (e.g. EC2, dockerized containers, etc). When Spark initializes, it will search for a network interface until it finds one that is not a loopback address. Then it will do a reverse DNS lookup for a hostname associated with that interface. Then the network components will use that hostname to advertise the component to other processes. In some cases, that hostname is used as the bind hostname also (e.g. I think this happens in the connection manager and possibly akka) - which will likely internally result in a re-resolution of this to an IP address. In other cases (the web UI and netty shuffle) we seem to bind to all interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5113) Audit and document use of hostnames and IP addresses in Spark
[ https://issues.apache.org/jira/browse/SPARK-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5113: --- Description: Spark has multiple network components that start servers and advertise their network addresses to other processes. We should go through each of these components and make sure they have consistent and/or documented behavior wrt (a) what interface(s) they bind to and (b) what hostname they use to advertise themselves to other processes. We should document this clearly and explain to people what to do in different cases (e.g. EC2, dockerized containers, etc). When Spark initializes, it will search for a network interface until it finds one that is not a loopback address. Then it will do a reverse DNS lookup for a hostname associated with that interface. Then the network components will use that hostname to advertise the component to other processes. That hostname is also the one used for the akka system identifier (akka supports only supplying a single name which it uses both as the bind interface and as the actor identifier). In some cases, that hostname is used as the bind hostname also (e.g. I think this happens in the connection manager and possibly akka) - which will likely internally result in a re-resolution of this to an IP address. In other cases (the web UI and netty shuffle) we seem to bind to all interfaces. was: Spark has multiple network components that start servers and advertise their network addresses to other processes. We should go through each of these components and make sure they have consistent and/or documented behavior wrt (a) what interface(s) they bind to and (b) what hostname they use to advertise themselves to other processes. We should document this clearly and explain to people what to do in different cases (e.g. EC2, dockerized containers, etc). When Spark initializes, it will search for a network interface until it finds one that is not a loopback address. Then it will do a reverse DNS lookup for a hostname associated with that interface. Then the network components will use that hostname to advertise the component to other processes. That hostname is also the one used for the akka system identifier. In some cases, that hostname is used as the bind hostname also (e.g. I think this happens in the connection manager and possibly akka) - which will likely internally result in a re-resolution of this to an IP address. In other cases (the web UI and netty shuffle) we seem to bind to all interfaces. Audit and document use of hostnames and IP addresses in Spark - Key: SPARK-5113 URL: https://issues.apache.org/jira/browse/SPARK-5113 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Priority: Critical Spark has multiple network components that start servers and advertise their network addresses to other processes. We should go through each of these components and make sure they have consistent and/or documented behavior wrt (a) what interface(s) they bind to and (b) what hostname they use to advertise themselves to other processes. We should document this clearly and explain to people what to do in different cases (e.g. EC2, dockerized containers, etc). When Spark initializes, it will search for a network interface until it finds one that is not a loopback address. Then it will do a reverse DNS lookup for a hostname associated with that interface. Then the network components will use that hostname to advertise the component to other processes. That hostname is also the one used for the akka system identifier (akka supports only supplying a single name which it uses both as the bind interface and as the actor identifier). In some cases, that hostname is used as the bind hostname also (e.g. I think this happens in the connection manager and possibly akka) - which will likely internally result in a re-resolution of this to an IP address. In other cases (the web UI and netty shuffle) we seem to bind to all interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5113) Audit and document use of hostnames and IP addresses in Spark
[ https://issues.apache.org/jira/browse/SPARK-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5113: --- Description: Spark has multiple network components that start servers and advertise their network addresses to other processes. We should go through each of these components and make sure they have consistent and/or documented behavior wrt (a) what interface(s) they bind to and (b) what hostname they use to advertise themselves to other processes. We should document this clearly and explain to people what to do in different cases (e.g. EC2, dockerized containers, etc). When Spark initializes, it will search for a network interface until it finds one that is not a loopback address. Then it will do a reverse DNS lookup for a hostname associated with that interface. Then the network components will use that hostname to advertise the component to other processes. That hostname is also the one used for the akka system identifier. In some cases, that hostname is used as the bind hostname also (e.g. I think this happens in the connection manager and possibly akka) - which will likely internally result in a re-resolution of this to an IP address. In other cases (the web UI and netty shuffle) we seem to bind to all interfaces. was: Spark has multiple network components that start servers and advertise their network addresses to other processes. We should go through each of these components and make sure they have consistent and/or documented behavior wrt (a) what interface(s) they bind to and (b) what hostname they use to advertise themselves to other processes. We should document this clearly and explain to people what to do in different cases (e.g. EC2, dockerized containers, etc). When Spark initializes, it will search for a network interface until it finds one that is not a loopback address. Then it will do a reverse DNS lookup for a hostname associated with that interface. Then the network components will use that hostname to advertise the component to other processes. In some cases, that hostname is used as the bind hostname also (e.g. I think this happens in the connection manager and possibly akka) - which will likely internally result in a re-resolution of this to an IP address. In other cases (the web UI and netty shuffle) we seem to bind to all interfaces. Audit and document use of hostnames and IP addresses in Spark - Key: SPARK-5113 URL: https://issues.apache.org/jira/browse/SPARK-5113 Project: Spark Issue Type: Bug Reporter: Patrick Wendell Priority: Critical Spark has multiple network components that start servers and advertise their network addresses to other processes. We should go through each of these components and make sure they have consistent and/or documented behavior wrt (a) what interface(s) they bind to and (b) what hostname they use to advertise themselves to other processes. We should document this clearly and explain to people what to do in different cases (e.g. EC2, dockerized containers, etc). When Spark initializes, it will search for a network interface until it finds one that is not a loopback address. Then it will do a reverse DNS lookup for a hostname associated with that interface. Then the network components will use that hostname to advertise the component to other processes. That hostname is also the one used for the akka system identifier. In some cases, that hostname is used as the bind hostname also (e.g. I think this happens in the connection manager and possibly akka) - which will likely internally result in a re-resolution of this to an IP address. In other cases (the web UI and netty shuffle) we seem to bind to all interfaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org