[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652352#comment-16652352 ] ASF GitHub Bot commented on FLINK-8623: --- zentol closed pull request #5449: [FLINK-8623] ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuri… URL: https://github.com/apache/flink/pull/5449 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/flink-runtime/src/test/java/org/apache/flink/runtime/net/ConnectionUtilsTest.java b/flink-runtime/src/test/java/org/apache/flink/runtime/net/ConnectionUtilsTest.java index d7c4baae4d1..5ec3f443527 100644 --- a/flink-runtime/src/test/java/org/apache/flink/runtime/net/ConnectionUtilsTest.java +++ b/flink-runtime/src/test/java/org/apache/flink/runtime/net/ConnectionUtilsTest.java @@ -18,6 +18,7 @@ package org.apache.flink.runtime.net; +import org.junit.Assume; import org.junit.Test; import org.junit.runner.RunWith; import org.mockito.Mockito; @@ -26,11 +27,13 @@ import org.powermock.modules.junit4.PowerMockRunner; import java.io.IOException; -import java.net.Inet4Address; import java.net.InetAddress; import java.net.InetSocketAddress; import java.net.ServerSocket; +import java.net.Socket; +import java.net.SocketAddress; import java.net.UnknownHostException; +import java.net.Inet4Address; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotNull; @@ -53,13 +56,28 @@ public void testReturnLocalHostAddressUsingHeuristics() throws Exception { final long start = System.nanoTime(); InetAddress add = ConnectionUtils.findConnectingAddress(unreachable, 2000, 400); + // we should have found a heuristic address + assertNotNull(add); + + // make sure that the InetAddress.getLocalHost is not a loopback address + Assume.assumeFalse(InetAddress.getLocalHost().isLoopbackAddress()); + + // make sure the connection address is not a loopback address + Assume.assumeFalse(add.isLoopbackAddress()); + + Socket socket = new Socket(); + + SocketAddress socketAddress = new InetSocketAddress(add, 0); + + socket.bind(socketAddress); + + // check whether can bind to this address + assertTrue(socket.isBound()); + // check that it did not take forever (max 30 seconds) // this check can unfortunately not be too tight, or it will be flaky on some CI infrastructure assertTrue(System.nanoTime() - start < 30_000_000_000L); - // we should have found a heuristic address - assertNotNull(add); - // make sure that we returned the InetAddress.getLocalHost as a heuristic assertEquals(InetAddress.getLocalHost(), add); } This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: test-stability > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652351#comment-16652351 ] ASF GitHub Bot commented on FLINK-8623: --- zentol commented on issue #5449: [FLINK-8623] ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuri… URL: https://github.com/apache/flink/pull/5449#issuecomment-430376913 Subsumed in FLINK-4052. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: test-stability > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419078#comment-16419078 ] Till Rohrmann commented on FLINK-8623: -- Unblocking 1.5.0 from this issue since it seems to exist for longer. > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.3 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409363#comment-16409363 ] ASF GitHub Bot commented on FLINK-8623: --- Github user zhangminglei commented on the issue: https://github.com/apache/flink/pull/5449 I copied the Travis log here, and I think this is what we want. And the line of 63 verifies it is not a loopback address. ``` // make sure that the InetAddress.getLocalHost is not a loopback address Assume.assumeFalse(InetAddress.getLocalHost().isLoopbackAddress()); ``` ``` Tests in error: ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics:63 » AssumptionViolated Tests run: 2553, Failures: 0, Errors: 1, Skipped: 8 17:38:48.959 [INFO] 17:38:48.959 [INFO] Reactor Summary: 17:38:48.959 [INFO] 17:38:48.959 [INFO] flink-core . SUCCESS [ 55.118 s] 17:38:48.959 [INFO] flink-java . SUCCESS [ 26.140 s] 17:38:48.959 [INFO] flink-runtime .. FAILURE [11:01 min] 17:38:48.959 [INFO] flink-optimizer SKIPPED 17:38:48.959 [INFO] flink-clients .. SKIPPED 17:38:48.959 [INFO] flink-streaming-java ... SKIPPED 17:38:48.962 [INFO] flink-scala SKIPPED 17:38:48.962 [INFO] flink-test-utils ... SKIPPED``` > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.3 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408069#comment-16408069 ] ASF GitHub Bot commented on FLINK-8623: --- Github user zhangminglei commented on the issue: https://github.com/apache/flink/pull/5449 Hi, @tillrohrmann I push a new version of code based on your suggestions. Could you please take a look when available ? Thanks. > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.3 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16406344#comment-16406344 ] ASF GitHub Bot commented on FLINK-8623: --- Github user zhangminglei commented on the issue: https://github.com/apache/flink/pull/5449 Thanks @tillrohrmann . I agree with you. This will produce a smaller refactoring without affecting the system itself. I will try soon. > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.3 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16406320#comment-16406320 ] ASF GitHub Bot commented on FLINK-8623: --- Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/5449 @zhangminglei, I think Stephan's idea was to not change any parts of the `ConnectionUtils`. Instead we should harden the `ConnectionUtilsTest#testReturnLocalHostAddressUsingHeuristics` by adding a `Assume.assumeFalse(InetAddress.getLocalHost().isLoopbackAddress)` and checking that this is also false for the returned address. Moreover, we could check whether we can actually bind to this address. > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.3 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396383#comment-16396383 ] ASF GitHub Bot commented on FLINK-8623: --- Github user zhangminglei commented on the issue: https://github.com/apache/flink/pull/5449 Hi, @NicoK . I think ```InetAddress.getAllByName("localhost")``` wont work since we still give the specific hostname for that. And it will return a loopback address. > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.3 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395208#comment-16395208 ] ASF GitHub Bot commented on FLINK-8623: --- Github user zhangminglei commented on the issue: https://github.com/apache/flink/pull/5449 Thanks @StephanEwen and @NicoK for the response and that let me know more about details about this issue. As refers to the first that return something we can bind to. Yes, As long as it is not the loopback address and I think it is okay for that. So, What do you think of if we add a check like the below, at this moment, I do not add a retry number here. Just give an example for hack this. But this might be cause an endless loop. ``` // our attempts timed out. use the heuristic fallback LOG.warn("Could not connect to {}. Selecting a local address using heuristics.", targetAddress); InetAddress heuristic = findAddressUsingStrategy(AddressDetectionState.HEURISTIC, targetAddress, true); if (heuristic != null) { while (heuristic.toString().contains("127.0.0.1") || heuristic.toString().contains("127.0.1.1")) { heuristic = findAddressUsingStrategy(AddressDetectionState.HEURISTIC, targetAddress, true); } return heuristic; } else { LOG.warn("Could not find any IPv4 address that is not loopback or link-local. Using localhost address."); InetAddress address = InetAddress.getLocalHost(); while (address.toString().contains("127.0.0.1") || address.toString().contains("127.0.1.1")) { address = InetAddress.getLocalHost(); } return address; } ``` @StephanEwen As refers the second , I think we can use the following code to find a stuff to bind to for the test. @NicoK Could you also take a look of those ? Thanks ~ ``` Enumeration nifs = NetworkInterface.getNetworkInterfaces(); while (nifs.hasMoreElements()) { NetworkInterface nif = nifs.nextElement(); Enumeration addresses = nif.getInetAddresses(); while (addresses.hasMoreElements()) { InetAddress addr = addresses.nextElement(); if (!(addr.getHostAddress().contains("127.0.0.1") || addr.getHostAddress().contains("127.0.1.1"))) { return addr; // this is the stuff we bind to } } } } ``` > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.3 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395097#comment-16395097 ] ASF GitHub Bot commented on FLINK-8623: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/5449 We cannot fully guard against incorrect network configurations. Many systems (even HDFS,, AFAIK) simply use `InetAddress.getLocalHost()` and rely on correct network configs. Our current strategy already tries to be smarter by trying to fond a connecting address, and only fall back to `getLocalHost()`. My feeling is that the strategy is fine (I have not seen connectivity issues in a while) and focus on stabilizing the tests. - We could reducing the heuristic check to "returns something we can bind to" - We could add an assumption to the test that `getLocalHost()` is not a loopback address and something we can bind to, and then run the tests. What do you think? > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.3 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395066#comment-16395066 ] ASF GitHub Bot commented on FLINK-8623: --- Github user NicoK commented on the issue: https://github.com/apache/flink/pull/5449 Thanks @StephanEwen indeed - I also did not grasp the full intend of the change. Looking at the code again, basically `ConnectionUtils#findConnectingAddress()` tries to connect with a set of strategies for a given time and if that passes, it will fall back to the heuristic. We could argue that if we do not find an interface to connect to the target address in a given time, we may not do too much about it rather than (a) failing now, (b) retrying forever, or (c) trying some heuristic that may work or fail later. The latter is implemented and seems sensible - we also print warnings to the log in that case. For the unit test, however, I think, the actual check should be reduced to "does return something and does not block" since according to the code in `ConnectionUtils` we wouldn't verify the heuristic was successful anyway: ``` // our attempts timed out. use the heuristic fallback LOG.warn("Could not connect to {}. Selecting a local address using heuristics.", targetAddress); InetAddress heuristic = findAddressUsingStrategy(AddressDetectionState.HEURISTIC, targetAddress, true); if (heuristic != null) { return heuristic; } else { LOG.warn("Could not find any IPv4 address that is not loopback or link-local. Using localhost address."); return InetAddress.getLocalHost(); } ``` We could maybe cycle through `InetAddress.getAllByName("localhost")` and verify the returned value is in there - unless I read this wrong - but I don't know if that check is of any actual value, to be honest. If we wanted to test for more, the only outside sign of the heuristic strategy being used is to look at the logs. > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.3 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395008#comment-16395008 ] ASF GitHub Bot commented on FLINK-8623: --- Github user zhangminglei commented on the issue: https://github.com/apache/flink/pull/5449 By the way, Although I have a Linux computer, I cannot access the Internet. :) I just use ```ifconfig``` check that. > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.3 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394999#comment-16394999 ] ASF GitHub Bot commented on FLINK-8623: --- Github user zhangminglei commented on the issue: https://github.com/apache/flink/pull/5449 I will verify whether retry can hack this issue. Not sure about it. > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.3 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394996#comment-16394996 ] ASF GitHub Bot commented on FLINK-8623: --- Github user zhangminglei commented on the issue: https://github.com/apache/flink/pull/5449 Thank you for your response, @StephanEwen You are very correct. We would not connect JM / RM / ZK if we use ```getByName("localhost")``` . I will give an example based I have known about this issue. We still can't believe ```getLocalHost()``` too much , as the file ```/etc/hosts``` once change the mapping of host name to an IP. That will cause a problem after that. And we will get different value( that you set the mapping including this value) Below is my machine about this file, and I can always get a correct stuff of ```ricezhang-pjhzf.vclound.com/10.199.203.242``` that works for my network if I do not set that manually. ``` 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 ``` Once I modify that file like below though, I will always get an **incorrect** value, because ```192.168.1.1``` is a a fake address. and will cause some issue. ``` 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.1.1 ricezhang-pjhzf.vclound.com ``` As refers to this **instability** test, **I don't know if something dynamically modified that file on Travis** ? Or something like that ? Even if not, we also CAN NOT rely on this method to get a IP that can connect to. Once the ```Zookeeper``` records the incorrect IP address, this will cause error. We can not get a correct IP to connect to what we want. I agree with retry or a better test ( I do not know at this moment ) approach, I will study this issue in depth in these days. > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.3 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392849#comment-16392849 ] ASF GitHub Bot commented on FLINK-8623: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/5449 This breaks the expected behavior, unfortunately. The logic is as follows: - try to connect to JM / RM / ZK and use an interface that you can connect from, if possible - if that does not work, fall back to your default interface (`getLocalHost()`) - the heuristic If we change step 2 to use `getByName("localhost")` it gives us the loopback interface which no one external can connect to. That would basically mean that TM is isolated for any data communication. The change to `ConnectionUtils` cannot be made like this. Instead, I would suggest to look at why the instability happens and fix the test instead (retry or a better test approach). > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.3 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388909#comment-16388909 ] ASF GitHub Bot commented on FLINK-8623: --- Github user zhangminglei commented on the issue: https://github.com/apache/flink/pull/5449 Thanks @NicoK . Use this patch pushed here at least two times and Travis CI verifies without error for this. > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.3 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384550#comment-16384550 ] ASF GitHub Bot commented on FLINK-8623: --- Github user zhangminglei commented on the issue: https://github.com/apache/flink/pull/5449 Sorry, @NicoK I tried the ```NULL``` situation. It seems wont work. > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.3 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384451#comment-16384451 ] ASF GitHub Bot commented on FLINK-8623: --- Github user zhangminglei commented on a diff in the pull request: https://github.com/apache/flink/pull/5449#discussion_r172004051 --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/net/ConnectionUtilsTest.java --- @@ -61,7 +61,7 @@ public void testReturnLocalHostAddressUsingHeuristics() throws Exception { assertNotNull(add); // make sure that we returned the InetAddress.getLocalHost as a heuristic - assertEquals(InetAddress.getLocalHost(), add); + assertEquals(InetAddress.getByName("localhost"), add); --- End diff -- Or, we can also still use ```InetAddress.getByName("localhost")```. Both ```null``` and ```InetAddress.getByName("localhost")``` are OKay. > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.3 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384450#comment-16384450 ] ASF GitHub Bot commented on FLINK-8623: --- Github user zhangminglei commented on a diff in the pull request: https://github.com/apache/flink/pull/5449#discussion_r172004016 --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/net/ConnectionUtilsTest.java --- @@ -61,7 +61,7 @@ public void testReturnLocalHostAddressUsingHeuristics() throws Exception { assertNotNull(add); // make sure that we returned the InetAddress.getLocalHost as a heuristic - assertEquals(InetAddress.getLocalHost(), add); + assertEquals(InetAddress.getByName("localhost"), add); --- End diff -- Thanks @NicoK review. Yes. I think this is a better way to use ```null``` instead of ```InetAddress.getLocalHost()``` to prevent multiple adapters cache expires from getting a another result in the future. I will give a quick fix for that. > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.3 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383644#comment-16383644 ] ASF GitHub Bot commented on FLINK-8623: --- Github user NicoK commented on a diff in the pull request: https://github.com/apache/flink/pull/5449#discussion_r171858504 --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/net/ConnectionUtilsTest.java --- @@ -61,7 +61,7 @@ public void testReturnLocalHostAddressUsingHeuristics() throws Exception { assertNotNull(add); // make sure that we returned the InetAddress.getLocalHost as a heuristic - assertEquals(InetAddress.getLocalHost(), add); + assertEquals(InetAddress.getByName("localhost"), add); --- End diff -- How about also changing the blocker above to accept connections on any interface just to rule accidental successful resolve attempts out, too, i.e. using `ServerSocket blocker = new ServerSocket(0, 1, null)`? > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.3 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359307#comment-16359307 ] ASF GitHub Bot commented on FLINK-8623: --- GitHub user zhangminglei opened a pull request: https://github.com/apache/flink/pull/5449 [FLINK-8623] ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuri… ## What is the purpose of the change Try to fix ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis. ## Brief change log Change the behavior get localhost ```InetAddress``` to ```InetAddress.getByName("localhost")``` instead of ```InetAddress.getLocalHost()```, as the latter one will return the actual IP of one of your network adapters. And in general ```InetAddress.getLocalHost()``` should be avoided. ## Verifying this change This change is already covered by existing tests, is ```testReturnLocalHostAddressUsingHeuristics``` in ```ConnectionUtilsTest.java```. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no ) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not documented) You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhangminglei/flink flink-8623 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5449.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5449 commit 67e70d558c7d2f47ed3f7a407ad250518a2f683c Author: zhangmingleiDate: 2018-02-10T07:16:09Z [FLINK-8623] ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: test-stability > Fix For: 1.5.0 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359303#comment-16359303 ] mingleizhang commented on FLINK-8623: - {\{getLocalHost()}} returns the actual IP of one of our network adapters. So, \{{InetAddress.getByName('localhost')}} should give the IP address that we expect. I will give a PR and watch what will happened with the new code. > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: test-stability > Fix For: 1.5.0 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359289#comment-16359289 ] mingleizhang commented on FLINK-8623: - Hi, [~till.rohrmann] The expected address is 127.0.1.1, but the actual value is 127.0.0.1. Actually, they are both to loopback address. So, I think this issue was caused by CI servers which based on different /etc/hosts file. What do you think ? Thanks. > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: test-stability > Fix For: 1.5.0 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)