[ https://issues.apache.org/jira/browse/GEODE-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15377423#comment-15377423 ]
Kevin Duling edited comment on GEODE-746 at 7/14/16 10:45 PM: -------------------------------------------------------------- Grace and I tracked the first part of this down to a problem in {{LauncherLifecycleCommands}}: {{String locatorHostName = StringUtils.defaultIfBlank(locatorLauncher.getHostnameForClients(), getLocalHost());}} We've changed this to look instead at the bind address first: {code} String locatorHostName; InetAddress bindAddr = locatorLauncher.getBindAddress(); if (bindAddr != null){ locatorHostName = bindAddr.getCanonicalHostName(); } else { locatorHostName = StringUtils.defaultIfBlank(locatorLauncher.getHostnameForClients(), getLocalHost()); } {code} This resolved the problem. The system will now connect: {{gfsh start locator --name=locator1 --port=19991 --bind-address=192.168.1.187}} {noformat} Listening for transport dt_socket at address: 30000 ............... Locator in /gemfire/open/locator1 on 192.168.1.187[19991] as locator1 is currently online. Process ID: 2765 Uptime: 1 minute 23 seconds GemFire Version: 1.0.0-incubating-SNAPSHOT Java Version: 1.8.0_92 Log File: /gemfire/open/locator1/locator1.log JVM Arguments: -Dgemfire.enable-cluster-configuration=true -Dgemfire.load-cluster-configuration-from-dir=false -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=29999 -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true -Dsun.rmi.dgc.server.gcInterval=9223372036854775806 Class-Path: /gemfire/open/geode-assembly/build/install/apache-geode/lib/geode-core-1.0.0-incubating-SNAPSHOT.jar:/gemfire/open/geode-assembly/build/install/apache-geode/lib/geode-dependencies.jar Successfully connected to: [host=pdx2-office-dhcp9.eng.vmware.com, port=1099] Cluster configuration service is up and running. {noformat} The successfully connected message appears to be showing the wrong IP address. Looking at netstat, we can see that the listener is correctly bound to the IP address specified: {noformat} $ netstat -an | grep 19991 tcp4 0 0 192.168.1.187.19991 *.* LISTEN {noformat} The "successfully connected" hostname reports a different NIC: {{ping pdx2-office-dhcp9.eng.vmware.com}} {noformat} PING pdx2-office-dhcp9.eng.vmware.com (10.118.33.209): 56 data bytes {noformat} Both NICs exist on this machine: {{nestat -rn}} {noformat} Routing tables Internet: Destination Gateway Flags Refs Use Netif Expire default 10.118.33.253 UGSc 360 0 en4 default 192.168.1.253 UGScI 35 0 en0 {noformat} Tracing this down, the address is coming from this line in {{ShellCommands.connectToLocator(String host, int port, int timeout, Map<String, String> props)}} {code} JmxManagerLocatorResponse locatorResponse = JmxManagerLocatorRequest.send(host, port, timeout, props); // locatorResponse: “JmxManagerLocatorResponse [host=10.118.33.209, port=1099, ssl=false, ex=null]” // host: “192.168.1.187” // port: 19991 // timeout: 15000 // props: size = 0 {code} So the confusion here now is that this is the JMX address, not the locator address. The formatting of this message lends one to believe it's supposed to be the locator. Yet, if you look at the original response from the system, it correctly reports the Locator's address: {noformat} Locator in /gemfire/open/locator1 on 192.168.1.187[19991] as locator1 is currently online. {noformat} I've added JMX to the "successfully connected" message to reduce confusion. was (Author: kduling): Grace and I tracked the first part of this down to a problem in {{LauncherLifecycleCommands}}: {{String locatorHostName = StringUtils.defaultIfBlank(locatorLauncher.getHostnameForClients(), getLocalHost());}} We've changed this to look instead at the bind address first: {code} String locatorHostName; InetAddress bindAddr = locatorLauncher.getBindAddress(); if (bindAddr != null){ locatorHostName = bindAddr.getCanonicalHostName(); } else { locatorHostName = StringUtils.defaultIfBlank(locatorLauncher.getHostnameForClients(), getLocalHost()); } {code} This improved things a little. The system will now connect: {{gfsh start locator --name=locator1 --port=19991 --bind-address=192.168.1.187}} {noformat} Listening for transport dt_socket at address: 30000 ............... Locator in /gemfire/open/locator1 on 192.168.1.187[19991] as locator1 is currently online. Process ID: 2765 Uptime: 1 minute 23 seconds GemFire Version: 1.0.0-incubating-SNAPSHOT Java Version: 1.8.0_92 Log File: /gemfire/open/locator1/locator1.log JVM Arguments: -Dgemfire.enable-cluster-configuration=true -Dgemfire.load-cluster-configuration-from-dir=false -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=29999 -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true -Dsun.rmi.dgc.server.gcInterval=9223372036854775806 Class-Path: /gemfire/open/geode-assembly/build/install/apache-geode/lib/geode-core-1.0.0-incubating-SNAPSHOT.jar:/gemfire/open/geode-assembly/build/install/apache-geode/lib/geode-dependencies.jar Successfully connected to: [host=pdx2-office-dhcp9.eng.vmware.com, port=1099] Cluster configuration service is up and running. {noformat} But now the successfully connected message is showing the wrong IP address. Looking at netstat, we can see that the listener is correctly bound to the IP address specified: {noformat} $ netstat -an | grep 19991 tcp4 0 0 192.168.1.187.19991 *.* LISTEN {noformat} Yet the hostname actually resolves to a different NIC: {{ping pdx2-office-dhcp9.eng.vmware.com}} {noformat} PING pdx2-office-dhcp9.eng.vmware.com (10.118.33.209): 56 data bytes {noformat} Both NICs exist on this machine, just one is being erroneously reported: {{nestat -rn}} {noformat} Routing tables Internet: Destination Gateway Flags Refs Use Netif Expire default 10.118.33.253 UGSc 360 0 en4 default 192.168.1.253 UGScI 35 0 en0 {noformat} Tracing this down, it appears to be an incorrect response from the locator in {{ShellCommands.connectToLocator(String host, int port, int timeout, Map<String, String> props)}} {code} JmxManagerLocatorResponse locatorResponse = JmxManagerLocatorRequest.send(host, port, timeout, props); // locatorResponse: “JmxManagerLocatorResponse [host=10.118.33.209, port=1099, ssl=false, ex=null]” // host: “192.168.1.187” // port: 19991 // timeout: 15000 // props: size = 0 {code} > When starting a locator using --bind-address, gfsh prints incorrect connect > message > ----------------------------------------------------------------------------------- > > Key: GEODE-746 > URL: https://issues.apache.org/jira/browse/GEODE-746 > Project: Geode > Issue Type: Improvement > Components: gfsh > Reporter: Jens Deppe > Assignee: Kevin Duling > > When starting my locator with {{gfsh start locator --name=locator1 > --port=19991 --bind-address=192.168.103.1}}, the output from gfsh looks like > this: > {noformat} > .............................. > Locator in /Users/jdeppe/debug/locator1 on 192.168.103.1[19991] as locator1 > is currently online. > Process ID: 2666 > Uptime: 15 seconds > GemFire Version: 8.2.0.Beta > Java Version: 1.7.0_72 > Log File: /Users/jdeppe/debug/locator1/locator1.log > JVM Arguments: -Dgemfire.enable-cluster-configuration=true > -Dgemfire.load-cluster-configuration-from-dir=false > -Dgemfire.launcher.registerSignalHandlers=true -Djava.awt.headless=true > -Dsun.rmi.dgc.server.gcInterval=9223372036854775806 > Class-Path: > /Users/jdeppe/gemfire/82/lib/gemfire.jar:/Users/jdeppe/gemfire/82/lib/locator-dependencies.jar > Please use "connect --locator=192.168.1.10[19991]" to connect Gfsh to the > locator. > Failed to connect; unknown cause: Connection refused > {noformat} > The connect string shown is just displaying my host address and not the bind > address. -- This message was sent by Atlassian JIRA (v6.3.4#6332)