[jira] [Commented] (SLIDER-1259) Slider does not work well in multi homed environments

2018-02-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SLIDER-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364348#comment-16364348
 ] 

Steve Loughran commented on SLIDER-1259:


And we are trying to build the hostname not through a simple getHostByName 
call, but that used for RPC with the 
{code}
  rpcServiceAddress = rpcService.getConnectAddress();
  appMasterHostname = rpcServiceAddress.getAddress().getCanonicalHostName();
{code}
... where the RPC service address is at {{InetSocketAddress rpcAddress = new 
InetSocketAddress("0.0.0.0", port);}}

I think if an explicit look for the yarn.nodemanager properties were to take 
place there, it would trickle through the RPC and port registration

Now, which of those properties to use. Or, put differently "who knows why there 
are three different ones?"

> Slider does not work well in multi homed environments
> -
>
> Key: SLIDER-1259
> URL: https://issues.apache.org/jira/browse/SLIDER-1259
> Project: Slider
>  Issue Type: Bug
>  Components: agent
>Affects Versions: Slider 0.92
>Reporter: Lev Bronshtein
>Priority: Minor
>
> In an an environment where Hadoop Worker nodes bind the Node Manager to an 
> interface with a hostname different from the one returned by socket.getfqdn() 
> for example in our test environment a difference between f-bcpc-vm3 and just 
> bcpc-vm3, which is the hostname bound to the management interface, but not 
> the interface for hadoop/production traffic.  This results in our inability 
> to introspect running jobs.
>  
> For example running  *slider registry --name slider_poc --listexp* results in 
> the following output in the ResourceManager logs
> {quote}2018-01-26 17:30:32,147 INFO 
> org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: ubuntu is 
> accessing unchecked 
> [http://bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports] which 
> is the app master GUI of application_1516910361403_0094 owned by ubuntu 
>  2018-01-26 17:31:13,639 WARN org.mortbay.log: 
> /proxy/application_1516910361403_0094/ws/v1/slider/publisher/exports: 
> java.net.ConnectException: Connection timed out (Connection timed out) 
> {quote}
>  
> Note how the redirect is to 
> [http://bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports,] 
> where as it should have been to 
> [http://f-bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports.]  
> Renaming the host to f-bcpc-vm3 results in appropriate behavior.
>  
> perhaps *hostname.py* can be instructed to look at one of before registering 
> *yarn.nodemanager.address*
>  *yarn.nodemanager.bind-host*
>  *yarn.nodemanager.hostname*
>  
> When called in Register.py
> register = {'responseId': int(id),
>   'timestamp': timestamp,
>   'label': self.config.getLabel(),
>   *'publicHostname': hostname.public_hostname(),*
>   'agentVersion': version,
>   'actualState': actualState,
>   'expectedState': expectedState,
>   'allocatedPorts': allocated_ports,
>   'logFolders': log_folders,
>   'tags': tags
>  }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SLIDER-1259) Slider does not work well in multi homed environments

2018-02-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/SLIDER-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364339#comment-16364339
 ] 

Steve Loughran commented on SLIDER-1259:


This is the webapp launched in SliderAppMaster, which is trying to come up on 
all ports
{code}
  WebApps.$for(SliderAMWebApp.BASE_PATH,
  WebAppApi.class,
  webAppApi,
  RestPaths.WS_CONTEXT)
 .withHttpPolicy(getConfig(), policy)
 .at("0.0.0.0", port, true)  //HERE
 .inDevMode()
 .start(webApp);
{code}

> Slider does not work well in multi homed environments
> -
>
> Key: SLIDER-1259
> URL: https://issues.apache.org/jira/browse/SLIDER-1259
> Project: Slider
>  Issue Type: Bug
>  Components: agent
>Affects Versions: Slider 0.92
>Reporter: Lev Bronshtein
>Priority: Minor
>
> In an an environment where Hadoop Worker nodes bind the Node Manager to an 
> interface with a hostname different from the one returned by socket.getfqdn() 
> for example in our test environment a difference between f-bcpc-vm3 and just 
> bcpc-vm3, which is the hostname bound to the management interface, but not 
> the interface for hadoop/production traffic.  This results in our inability 
> to introspect running jobs.
>  
> For example running  *slider registry --name slider_poc --listexp* results in 
> the following output in the ResourceManager logs
> {quote}2018-01-26 17:30:32,147 INFO 
> org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: ubuntu is 
> accessing unchecked 
> [http://bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports] which 
> is the app master GUI of application_1516910361403_0094 owned by ubuntu 
>  2018-01-26 17:31:13,639 WARN org.mortbay.log: 
> /proxy/application_1516910361403_0094/ws/v1/slider/publisher/exports: 
> java.net.ConnectException: Connection timed out (Connection timed out) 
> {quote}
>  
> Note how the redirect is to 
> [http://bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports,] 
> where as it should have been to 
> [http://f-bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports.]  
> Renaming the host to f-bcpc-vm3 results in appropriate behavior.
>  
> perhaps *hostname.py* can be instructed to look at one of before registering 
> *yarn.nodemanager.address*
>  *yarn.nodemanager.bind-host*
>  *yarn.nodemanager.hostname*
>  
> When called in Register.py
> register = {'responseId': int(id),
>   'timestamp': timestamp,
>   'label': self.config.getLabel(),
>   *'publicHostname': hostname.public_hostname(),*
>   'agentVersion': version,
>   'actualState': actualState,
>   'expectedState': expectedState,
>   'allocatedPorts': allocated_ports,
>   'logFolders': log_folders,
>   'tags': tags
>  }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)