[jira] [Assigned] (MESOS-10190) libprocess fails with "Failed to obtain the IP address for " when using CNI on some hosts

2020-09-24 Thread Benjamin Mahler (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-10190:
---

Assignee: Benjamin Mahler

> libprocess fails with "Failed to obtain the IP address for " when using 
> CNI on some hosts
> ---
>
> Key: MESOS-10190
> URL: https://issues.apache.org/jira/browse/MESOS-10190
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Affects Versions: 1.9.0
>Reporter: acecile555
>Assignee: Benjamin Mahler
>Priority: Major
>
> Hello,
>  
> We deployed CNI support and 3 of our hosts (all the same) are failing to 
> start container with CNI enabled. The log file is:
> {noformat}
> E0917 16:58:11.481551 16770 process.cpp:1153] EXIT with status 1: Failed to 
> obtain the IP address for '7c4beac7-5385-4dfa-845a-beb01e13c77c'; the DNS 
> service may not be able to resolve it: Name or service not known{noformat}
> So I tried enforcing LIBPROCESS_IP using env variable but I saw Mesos 
> overwrites it. So I rebuilt Mesos with additionnal debugging and here is the 
> log:
> {noformat}
> Overwriting environment variable 'LIBPROCESS_IP' from '10.99.50.3' to 
> '0.0.0.0'
> E0917 16:34:49.779429 31428 process.cpp:1153] EXIT with status 1: Failed to 
> obtain the IP address for 'de65bbd8-b237-4884-ba87-7e13cb85078f'; the DNS 
> service may not be able to resolve it: Name or service not known{noformat}
> According to the code, it's expected to be set to 0.0.0.0 (MESOS-5127). So I 
> tried to understand why libprocess attempts to resolve a container run uuid 
> instead of the hostname, here is libprocess code:
>  
> {noformat}
> // Resolve the hostname if ip is 0.0.0.0 in case we actually have
>  // a valid external IP address. Note that we need only one IP
>  // address, so that other processes can send and receive and
>  // don't get confused as to whom they are sending to.
>  if (__address__.ip.isAny()) {
>  char hostname[512];
> if (gethostname(hostname, sizeof(hostname)) < 0) {
>  PLOG(FATAL) << "Failed to initialize, gethostname";
>  }
> // Lookup an IP address of local hostname, taking the first result.
>  Try ip = net::getIP(hostname, __address__.ip.family());
> if (ip.isError()) {
>  EXIT(EXIT_FAILURE)
>  << "Failed to obtain the IP address for '" << hostname << "';"
>  << " the DNS service may not be able to resolve it: " << ip.error();
>  }
> __address__.ip = ip.get();
>  }
> {noformat}
>  
> Well actually this is perfectly fine, except "gethostname" returns the 
> container UUID instead of an valid host IP address. How is that even possible 
> ?
>  
> Any help would be greatly appreciated.
> Regards, Adam.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (MESOS-10190) libprocess fails with "Failed to obtain the IP address for " when using CNI on some hosts

2020-09-24 Thread Benjamin Mahler (Jira)


 [ 
https://issues.apache.org/jira/browse/MESOS-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-10190:
---

Assignee: (was: Benjamin Mahler)

> libprocess fails with "Failed to obtain the IP address for " when using 
> CNI on some hosts
> ---
>
> Key: MESOS-10190
> URL: https://issues.apache.org/jira/browse/MESOS-10190
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Affects Versions: 1.9.0
>Reporter: acecile555
>Priority: Major
>
> Hello,
>  
> We deployed CNI support and 3 of our hosts (all the same) are failing to 
> start container with CNI enabled. The log file is:
> {noformat}
> E0917 16:58:11.481551 16770 process.cpp:1153] EXIT with status 1: Failed to 
> obtain the IP address for '7c4beac7-5385-4dfa-845a-beb01e13c77c'; the DNS 
> service may not be able to resolve it: Name or service not known{noformat}
> So I tried enforcing LIBPROCESS_IP using env variable but I saw Mesos 
> overwrites it. So I rebuilt Mesos with additionnal debugging and here is the 
> log:
> {noformat}
> Overwriting environment variable 'LIBPROCESS_IP' from '10.99.50.3' to 
> '0.0.0.0'
> E0917 16:34:49.779429 31428 process.cpp:1153] EXIT with status 1: Failed to 
> obtain the IP address for 'de65bbd8-b237-4884-ba87-7e13cb85078f'; the DNS 
> service may not be able to resolve it: Name or service not known{noformat}
> According to the code, it's expected to be set to 0.0.0.0 (MESOS-5127). So I 
> tried to understand why libprocess attempts to resolve a container run uuid 
> instead of the hostname, here is libprocess code:
>  
> {noformat}
> // Resolve the hostname if ip is 0.0.0.0 in case we actually have
>  // a valid external IP address. Note that we need only one IP
>  // address, so that other processes can send and receive and
>  // don't get confused as to whom they are sending to.
>  if (__address__.ip.isAny()) {
>  char hostname[512];
> if (gethostname(hostname, sizeof(hostname)) < 0) {
>  PLOG(FATAL) << "Failed to initialize, gethostname";
>  }
> // Lookup an IP address of local hostname, taking the first result.
>  Try ip = net::getIP(hostname, __address__.ip.family());
> if (ip.isError()) {
>  EXIT(EXIT_FAILURE)
>  << "Failed to obtain the IP address for '" << hostname << "';"
>  << " the DNS service may not be able to resolve it: " << ip.error();
>  }
> __address__.ip = ip.get();
>  }
> {noformat}
>  
> Well actually this is perfectly fine, except "gethostname" returns the 
> container UUID instead of an valid host IP address. How is that even possible 
> ?
>  
> Any help would be greatly appreciated.
> Regards, Adam.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)