[jira] [Commented] (SLIDER-1259) Slider does not work well in multi homed environments
[ https://issues.apache.org/jira/browse/SLIDER-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364348#comment-16364348 ] Steve Loughran commented on SLIDER-1259: And we are trying to build the hostname not through a simple getHostByName call, but that used for RPC with the {code} rpcServiceAddress = rpcService.getConnectAddress(); appMasterHostname = rpcServiceAddress.getAddress().getCanonicalHostName(); {code} ... where the RPC service address is at {{InetSocketAddress rpcAddress = new InetSocketAddress("0.0.0.0", port);}} I think if an explicit look for the yarn.nodemanager properties were to take place there, it would trickle through the RPC and port registration Now, which of those properties to use. Or, put differently "who knows why there are three different ones?" > Slider does not work well in multi homed environments > - > > Key: SLIDER-1259 > URL: https://issues.apache.org/jira/browse/SLIDER-1259 > Project: Slider > Issue Type: Bug > Components: agent >Affects Versions: Slider 0.92 >Reporter: Lev Bronshtein >Priority: Minor > > In an an environment where Hadoop Worker nodes bind the Node Manager to an > interface with a hostname different from the one returned by socket.getfqdn() > for example in our test environment a difference between f-bcpc-vm3 and just > bcpc-vm3, which is the hostname bound to the management interface, but not > the interface for hadoop/production traffic. This results in our inability > to introspect running jobs. > > For example running *slider registry --name slider_poc --listexp* results in > the following output in the ResourceManager logs > {quote}2018-01-26 17:30:32,147 INFO > org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: ubuntu is > accessing unchecked > [http://bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports] which > is the app master GUI of application_1516910361403_0094 owned by ubuntu > 2018-01-26 17:31:13,639 WARN org.mortbay.log: > /proxy/application_1516910361403_0094/ws/v1/slider/publisher/exports: > java.net.ConnectException: Connection timed out (Connection timed out) > {quote} > > Note how the redirect is to > [http://bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports,] > where as it should have been to > [http://f-bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports.] > Renaming the host to f-bcpc-vm3 results in appropriate behavior. > > perhaps *hostname.py* can be instructed to look at one of before registering > *yarn.nodemanager.address* > *yarn.nodemanager.bind-host* > *yarn.nodemanager.hostname* > > When called in Register.py > register = {'responseId': int(id), > 'timestamp': timestamp, > 'label': self.config.getLabel(), > *'publicHostname': hostname.public_hostname(),* > 'agentVersion': version, > 'actualState': actualState, > 'expectedState': expectedState, > 'allocatedPorts': allocated_ports, > 'logFolders': log_folders, > 'tags': tags > } -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (SLIDER-1259) Slider does not work well in multi homed environments
[ https://issues.apache.org/jira/browse/SLIDER-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364339#comment-16364339 ] Steve Loughran commented on SLIDER-1259: This is the webapp launched in SliderAppMaster, which is trying to come up on all ports {code} WebApps.$for(SliderAMWebApp.BASE_PATH, WebAppApi.class, webAppApi, RestPaths.WS_CONTEXT) .withHttpPolicy(getConfig(), policy) .at("0.0.0.0", port, true) //HERE .inDevMode() .start(webApp); {code} > Slider does not work well in multi homed environments > - > > Key: SLIDER-1259 > URL: https://issues.apache.org/jira/browse/SLIDER-1259 > Project: Slider > Issue Type: Bug > Components: agent >Affects Versions: Slider 0.92 >Reporter: Lev Bronshtein >Priority: Minor > > In an an environment where Hadoop Worker nodes bind the Node Manager to an > interface with a hostname different from the one returned by socket.getfqdn() > for example in our test environment a difference between f-bcpc-vm3 and just > bcpc-vm3, which is the hostname bound to the management interface, but not > the interface for hadoop/production traffic. This results in our inability > to introspect running jobs. > > For example running *slider registry --name slider_poc --listexp* results in > the following output in the ResourceManager logs > {quote}2018-01-26 17:30:32,147 INFO > org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: ubuntu is > accessing unchecked > [http://bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports] which > is the app master GUI of application_1516910361403_0094 owned by ubuntu > 2018-01-26 17:31:13,639 WARN org.mortbay.log: > /proxy/application_1516910361403_0094/ws/v1/slider/publisher/exports: > java.net.ConnectException: Connection timed out (Connection timed out) > {quote} > > Note how the redirect is to > [http://bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports,] > where as it should have been to > [http://f-bcpc-vm3.bcpc.example.com:46391/ws/v1/slider/publisher/exports.] > Renaming the host to f-bcpc-vm3 results in appropriate behavior. > > perhaps *hostname.py* can be instructed to look at one of before registering > *yarn.nodemanager.address* > *yarn.nodemanager.bind-host* > *yarn.nodemanager.hostname* > > When called in Register.py > register = {'responseId': int(id), > 'timestamp': timestamp, > 'label': self.config.getLabel(), > *'publicHostname': hostname.public_hostname(),* > 'agentVersion': version, > 'actualState': actualState, > 'expectedState': expectedState, > 'allocatedPorts': allocated_ports, > 'logFolders': log_folders, > 'tags': tags > } -- This message was sent by Atlassian JIRA (v7.6.3#76005)