[jira] [Commented] (FLINK-25749) YARNSessionFIFOSecuredITCase.testDetachedMode fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-25749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480219#comment-17480219 ] Till Rohrmann commented on FLINK-25749: --- The problem is indeed caused by https://github.com/apache/flink/commit/dd6069fabf8a7ff65fbd9ff8dd7b0c47f492288f#diff-5ff30e09fc23978573250c9d95969a549be12648c085bd581696cf0b84da3a0b because due to the introduce shut down hook, it can happen that the TM deregisters from the RM which will queue up an operation in the {{NMClientAsync}}. Now if the RM stops and closes the {{NMClientAsync}} this can lead to exceptions that are logged. Luckily, https://github.com/apache/flink/pull/18169 will solve this problem properly ([~dmvk] correct me if I have told incorrect things). > YARNSessionFIFOSecuredITCase.testDetachedMode fails on AZP > -- > > Key: FLINK-25749 > URL: https://issues.apache.org/jira/browse/FLINK-25749 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN >Affects Versions: 1.15.0 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Critical > Labels: test-stability > > The test {{YARNSessionFIFOSecuredITCase.testDetachedMode}} fails on AZP: > {code} > 2022-01-21T03:28:18.3712993Z Jan 21 03:28:18 java.lang.AssertionError: > 2022-01-21T03:28:18.3715115Z Jan 21 03:28:18 Found a file > /__w/2/s/flink-yarn-tests/target/flink-yarn-tests-fifo-secured/flink-yarn-tests-fifo-secured-logDir-nm-0_0/application_1642735639007_0002/container_1642735639007_0002_01_01/jobmanager.log > with a prohibited string (one of [Exception, Started > SelectChannelConnector@0.0.0.0:8081]). Excerpts: > 2022-01-21T03:28:18.3716389Z Jan 21 03:28:18 [ > 2022-01-21T03:28:18.3717531Z Jan 21 03:28:18 2022-01-21 03:27:56,921 INFO > org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - > Resource manager service is not running. Ignore revoking leadership. > 2022-01-21T03:28:18.3720496Z Jan 21 03:28:18 2022-01-21 03:27:56,922 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Stopped > dispatcher akka.tcp://flink@11c5f741db81:37697/user/rpc/dispatcher_0. > 2022-01-21T03:28:18.3722401Z Jan 21 03:28:18 2022-01-21 03:27:56,922 INFO > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl [] - > Interrupted while waiting for queue > 2022-01-21T03:28:18.3723661Z Jan 21 03:28:18 java.lang.InterruptedException: > null > 2022-01-21T03:28:18.3724529Z Jan 21 03:28:18 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3725450Z Jan 21 03:28:18 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3726239Z Jan 21 03:28:18 at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3727618Z Jan 21 03:28:18 at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:323) > [hadoop-yarn-client-2.8.5.jar:?] > 2022-01-21T03:28:18.3729147Z Jan 21 03:28:18 2022-01-21 03:27:56,927 WARN > org.apache.hadoop.ipc.Client [] - Failed to > connect to server: 11c5f741db81/172.25.0.2:39121: retries get failed due to > exceeded maximum allowed retries number: 0 > 2022-01-21T03:28:18.3730293Z Jan 21 03:28:18 > java.nio.channels.ClosedByInterruptException: null > 2022-01-21T03:28:18.3730834Z Jan 21 03:28:18 > java.nio.channels.ClosedByInterruptException: null > 2022-01-21T03:28:18.3731499Z Jan 21 03:28:18 at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3732203Z Jan 21 03:28:18 at > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:658) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3733478Z Jan 21 03:28:18 at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) > ~[hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3734470Z Jan 21 03:28:18 at > org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > ~[hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3735432Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3736414Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3737734Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3738853Z Jan 21 03:2
[jira] [Commented] (FLINK-25749) YARNSessionFIFOSecuredITCase.testDetachedMode fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-25749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480214#comment-17480214 ] David Morávek commented on FLINK-25749: --- Merging https://github.com/apache/flink/pull/18446 after the CI passes should fix the issue > YARNSessionFIFOSecuredITCase.testDetachedMode fails on AZP > -- > > Key: FLINK-25749 > URL: https://issues.apache.org/jira/browse/FLINK-25749 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN >Affects Versions: 1.15.0 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Critical > Labels: test-stability > > The test {{YARNSessionFIFOSecuredITCase.testDetachedMode}} fails on AZP: > {code} > 2022-01-21T03:28:18.3712993Z Jan 21 03:28:18 java.lang.AssertionError: > 2022-01-21T03:28:18.3715115Z Jan 21 03:28:18 Found a file > /__w/2/s/flink-yarn-tests/target/flink-yarn-tests-fifo-secured/flink-yarn-tests-fifo-secured-logDir-nm-0_0/application_1642735639007_0002/container_1642735639007_0002_01_01/jobmanager.log > with a prohibited string (one of [Exception, Started > SelectChannelConnector@0.0.0.0:8081]). Excerpts: > 2022-01-21T03:28:18.3716389Z Jan 21 03:28:18 [ > 2022-01-21T03:28:18.3717531Z Jan 21 03:28:18 2022-01-21 03:27:56,921 INFO > org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - > Resource manager service is not running. Ignore revoking leadership. > 2022-01-21T03:28:18.3720496Z Jan 21 03:28:18 2022-01-21 03:27:56,922 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Stopped > dispatcher akka.tcp://flink@11c5f741db81:37697/user/rpc/dispatcher_0. > 2022-01-21T03:28:18.3722401Z Jan 21 03:28:18 2022-01-21 03:27:56,922 INFO > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl [] - > Interrupted while waiting for queue > 2022-01-21T03:28:18.3723661Z Jan 21 03:28:18 java.lang.InterruptedException: > null > 2022-01-21T03:28:18.3724529Z Jan 21 03:28:18 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3725450Z Jan 21 03:28:18 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3726239Z Jan 21 03:28:18 at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3727618Z Jan 21 03:28:18 at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:323) > [hadoop-yarn-client-2.8.5.jar:?] > 2022-01-21T03:28:18.3729147Z Jan 21 03:28:18 2022-01-21 03:27:56,927 WARN > org.apache.hadoop.ipc.Client [] - Failed to > connect to server: 11c5f741db81/172.25.0.2:39121: retries get failed due to > exceeded maximum allowed retries number: 0 > 2022-01-21T03:28:18.3730293Z Jan 21 03:28:18 > java.nio.channels.ClosedByInterruptException: null > 2022-01-21T03:28:18.3730834Z Jan 21 03:28:18 > java.nio.channels.ClosedByInterruptException: null > 2022-01-21T03:28:18.3731499Z Jan 21 03:28:18 at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3732203Z Jan 21 03:28:18 at > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:658) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3733478Z Jan 21 03:28:18 at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) > ~[hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3734470Z Jan 21 03:28:18 at > org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > ~[hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3735432Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3736414Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3737734Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3738853Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client.getConnection(Client.java:1550) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3739752Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client.call(Client.java:1381) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3740638Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client.call(Client.java:1345) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3741589Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(
[jira] [Commented] (FLINK-25749) YARNSessionFIFOSecuredITCase.testDetachedMode fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-25749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480212#comment-17480212 ] Till Rohrmann commented on FLINK-25749: --- David raised the suspicion that the instability could be caused by https://github.com/apache/flink/commit/dd6069fabf8a7ff65fbd9ff8dd7b0c47f492288f#diff-5ff30e09fc23978573250c9d95969a549be12648c085bd581696cf0b84da3a0b. Let me quickly double check whether I can reproduce it. Since I am responsible for this change, let me first try to clean up my mess. > YARNSessionFIFOSecuredITCase.testDetachedMode fails on AZP > -- > > Key: FLINK-25749 > URL: https://issues.apache.org/jira/browse/FLINK-25749 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN >Affects Versions: 1.15.0 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Critical > Labels: test-stability > > The test {{YARNSessionFIFOSecuredITCase.testDetachedMode}} fails on AZP: > {code} > 2022-01-21T03:28:18.3712993Z Jan 21 03:28:18 java.lang.AssertionError: > 2022-01-21T03:28:18.3715115Z Jan 21 03:28:18 Found a file > /__w/2/s/flink-yarn-tests/target/flink-yarn-tests-fifo-secured/flink-yarn-tests-fifo-secured-logDir-nm-0_0/application_1642735639007_0002/container_1642735639007_0002_01_01/jobmanager.log > with a prohibited string (one of [Exception, Started > SelectChannelConnector@0.0.0.0:8081]). Excerpts: > 2022-01-21T03:28:18.3716389Z Jan 21 03:28:18 [ > 2022-01-21T03:28:18.3717531Z Jan 21 03:28:18 2022-01-21 03:27:56,921 INFO > org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - > Resource manager service is not running. Ignore revoking leadership. > 2022-01-21T03:28:18.3720496Z Jan 21 03:28:18 2022-01-21 03:27:56,922 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Stopped > dispatcher akka.tcp://flink@11c5f741db81:37697/user/rpc/dispatcher_0. > 2022-01-21T03:28:18.3722401Z Jan 21 03:28:18 2022-01-21 03:27:56,922 INFO > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl [] - > Interrupted while waiting for queue > 2022-01-21T03:28:18.3723661Z Jan 21 03:28:18 java.lang.InterruptedException: > null > 2022-01-21T03:28:18.3724529Z Jan 21 03:28:18 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3725450Z Jan 21 03:28:18 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3726239Z Jan 21 03:28:18 at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3727618Z Jan 21 03:28:18 at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:323) > [hadoop-yarn-client-2.8.5.jar:?] > 2022-01-21T03:28:18.3729147Z Jan 21 03:28:18 2022-01-21 03:27:56,927 WARN > org.apache.hadoop.ipc.Client [] - Failed to > connect to server: 11c5f741db81/172.25.0.2:39121: retries get failed due to > exceeded maximum allowed retries number: 0 > 2022-01-21T03:28:18.3730293Z Jan 21 03:28:18 > java.nio.channels.ClosedByInterruptException: null > 2022-01-21T03:28:18.3730834Z Jan 21 03:28:18 > java.nio.channels.ClosedByInterruptException: null > 2022-01-21T03:28:18.3731499Z Jan 21 03:28:18 at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3732203Z Jan 21 03:28:18 at > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:658) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3733478Z Jan 21 03:28:18 at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) > ~[hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3734470Z Jan 21 03:28:18 at > org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > ~[hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3735432Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3736414Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3737734Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3738853Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client.getConnection(Client.java:1550) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3739752Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client.call(Client.java:1381) > [hadoop-comm
[jira] [Commented] (FLINK-25749) YARNSessionFIFOSecuredITCase.testDetachedMode fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-25749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480118#comment-17480118 ] Junfan Zhang commented on FLINK-25749: -- All failure above looks lost connection with mini-yarn, maybe due to unstable network. Maybe we could set config of ipc client to solve, like as follows: {code:java} yarnClusterConf.setInt("ipc.client.connection.maxidletime", 1000); yarnClusterConf.setInt("ipc.client.connect.max.retries", 3); yarnClusterConf.setInt("ipc.client.connect.retry.interval", 10); yarnClusterConf.setInt("ipc.client.connect.timeout", 1000); yarnClusterConf.setInt("ipc.client.connect.max.retries.on.timeouts", 3); {code} [~trohrmann] Do you think so? Maybe I can take over this ticket to improve test stability. > YARNSessionFIFOSecuredITCase.testDetachedMode fails on AZP > -- > > Key: FLINK-25749 > URL: https://issues.apache.org/jira/browse/FLINK-25749 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN >Affects Versions: 1.15.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: test-stability > > The test {{YARNSessionFIFOSecuredITCase.testDetachedMode}} fails on AZP: > {code} > 2022-01-21T03:28:18.3712993Z Jan 21 03:28:18 java.lang.AssertionError: > 2022-01-21T03:28:18.3715115Z Jan 21 03:28:18 Found a file > /__w/2/s/flink-yarn-tests/target/flink-yarn-tests-fifo-secured/flink-yarn-tests-fifo-secured-logDir-nm-0_0/application_1642735639007_0002/container_1642735639007_0002_01_01/jobmanager.log > with a prohibited string (one of [Exception, Started > SelectChannelConnector@0.0.0.0:8081]). Excerpts: > 2022-01-21T03:28:18.3716389Z Jan 21 03:28:18 [ > 2022-01-21T03:28:18.3717531Z Jan 21 03:28:18 2022-01-21 03:27:56,921 INFO > org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - > Resource manager service is not running. Ignore revoking leadership. > 2022-01-21T03:28:18.3720496Z Jan 21 03:28:18 2022-01-21 03:27:56,922 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Stopped > dispatcher akka.tcp://flink@11c5f741db81:37697/user/rpc/dispatcher_0. > 2022-01-21T03:28:18.3722401Z Jan 21 03:28:18 2022-01-21 03:27:56,922 INFO > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl [] - > Interrupted while waiting for queue > 2022-01-21T03:28:18.3723661Z Jan 21 03:28:18 java.lang.InterruptedException: > null > 2022-01-21T03:28:18.3724529Z Jan 21 03:28:18 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3725450Z Jan 21 03:28:18 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3726239Z Jan 21 03:28:18 at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3727618Z Jan 21 03:28:18 at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:323) > [hadoop-yarn-client-2.8.5.jar:?] > 2022-01-21T03:28:18.3729147Z Jan 21 03:28:18 2022-01-21 03:27:56,927 WARN > org.apache.hadoop.ipc.Client [] - Failed to > connect to server: 11c5f741db81/172.25.0.2:39121: retries get failed due to > exceeded maximum allowed retries number: 0 > 2022-01-21T03:28:18.3730293Z Jan 21 03:28:18 > java.nio.channels.ClosedByInterruptException: null > 2022-01-21T03:28:18.3730834Z Jan 21 03:28:18 > java.nio.channels.ClosedByInterruptException: null > 2022-01-21T03:28:18.3731499Z Jan 21 03:28:18 at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3732203Z Jan 21 03:28:18 at > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:658) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3733478Z Jan 21 03:28:18 at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) > ~[hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3734470Z Jan 21 03:28:18 at > org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > ~[hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3735432Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3736414Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3737734Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3738853Z Jan 21 03:28:18 at > org.apache.
[jira] [Commented] (FLINK-25749) YARNSessionFIFOSecuredITCase.testDetachedMode fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-25749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480057#comment-17480057 ] Till Rohrmann commented on FLINK-25749: --- Here the {{YARNSessionFIFOITCase.testDetachedMode}} failed but it is probably the same reason. https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=29871&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=995c650b-6573-581c-9ce6-7ad4cc038461&l=30735 > YARNSessionFIFOSecuredITCase.testDetachedMode fails on AZP > -- > > Key: FLINK-25749 > URL: https://issues.apache.org/jira/browse/FLINK-25749 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN >Affects Versions: 1.15.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: test-stability > > The test {{YARNSessionFIFOSecuredITCase.testDetachedMode}} fails on AZP: > {code} > 2022-01-21T03:28:18.3712993Z Jan 21 03:28:18 java.lang.AssertionError: > 2022-01-21T03:28:18.3715115Z Jan 21 03:28:18 Found a file > /__w/2/s/flink-yarn-tests/target/flink-yarn-tests-fifo-secured/flink-yarn-tests-fifo-secured-logDir-nm-0_0/application_1642735639007_0002/container_1642735639007_0002_01_01/jobmanager.log > with a prohibited string (one of [Exception, Started > SelectChannelConnector@0.0.0.0:8081]). Excerpts: > 2022-01-21T03:28:18.3716389Z Jan 21 03:28:18 [ > 2022-01-21T03:28:18.3717531Z Jan 21 03:28:18 2022-01-21 03:27:56,921 INFO > org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - > Resource manager service is not running. Ignore revoking leadership. > 2022-01-21T03:28:18.3720496Z Jan 21 03:28:18 2022-01-21 03:27:56,922 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Stopped > dispatcher akka.tcp://flink@11c5f741db81:37697/user/rpc/dispatcher_0. > 2022-01-21T03:28:18.3722401Z Jan 21 03:28:18 2022-01-21 03:27:56,922 INFO > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl [] - > Interrupted while waiting for queue > 2022-01-21T03:28:18.3723661Z Jan 21 03:28:18 java.lang.InterruptedException: > null > 2022-01-21T03:28:18.3724529Z Jan 21 03:28:18 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3725450Z Jan 21 03:28:18 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3726239Z Jan 21 03:28:18 at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3727618Z Jan 21 03:28:18 at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:323) > [hadoop-yarn-client-2.8.5.jar:?] > 2022-01-21T03:28:18.3729147Z Jan 21 03:28:18 2022-01-21 03:27:56,927 WARN > org.apache.hadoop.ipc.Client [] - Failed to > connect to server: 11c5f741db81/172.25.0.2:39121: retries get failed due to > exceeded maximum allowed retries number: 0 > 2022-01-21T03:28:18.3730293Z Jan 21 03:28:18 > java.nio.channels.ClosedByInterruptException: null > 2022-01-21T03:28:18.3730834Z Jan 21 03:28:18 > java.nio.channels.ClosedByInterruptException: null > 2022-01-21T03:28:18.3731499Z Jan 21 03:28:18 at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3732203Z Jan 21 03:28:18 at > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:658) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3733478Z Jan 21 03:28:18 at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) > ~[hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3734470Z Jan 21 03:28:18 at > org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > ~[hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3735432Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3736414Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3737734Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3738853Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client.getConnection(Client.java:1550) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3739752Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client.call(Client.java:1381) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3740638Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client.call(Client.java:1345)
[jira] [Commented] (FLINK-25749) YARNSessionFIFOSecuredITCase.testDetachedMode fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-25749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480056#comment-17480056 ] Till Rohrmann commented on FLINK-25749: --- Another instance: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=29867&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=995c650b-6573-581c-9ce6-7ad4cc038461&l=30719 > YARNSessionFIFOSecuredITCase.testDetachedMode fails on AZP > -- > > Key: FLINK-25749 > URL: https://issues.apache.org/jira/browse/FLINK-25749 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN >Affects Versions: 1.15.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: test-stability > > The test {{YARNSessionFIFOSecuredITCase.testDetachedMode}} fails on AZP: > {code} > 2022-01-21T03:28:18.3712993Z Jan 21 03:28:18 java.lang.AssertionError: > 2022-01-21T03:28:18.3715115Z Jan 21 03:28:18 Found a file > /__w/2/s/flink-yarn-tests/target/flink-yarn-tests-fifo-secured/flink-yarn-tests-fifo-secured-logDir-nm-0_0/application_1642735639007_0002/container_1642735639007_0002_01_01/jobmanager.log > with a prohibited string (one of [Exception, Started > SelectChannelConnector@0.0.0.0:8081]). Excerpts: > 2022-01-21T03:28:18.3716389Z Jan 21 03:28:18 [ > 2022-01-21T03:28:18.3717531Z Jan 21 03:28:18 2022-01-21 03:27:56,921 INFO > org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - > Resource manager service is not running. Ignore revoking leadership. > 2022-01-21T03:28:18.3720496Z Jan 21 03:28:18 2022-01-21 03:27:56,922 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Stopped > dispatcher akka.tcp://flink@11c5f741db81:37697/user/rpc/dispatcher_0. > 2022-01-21T03:28:18.3722401Z Jan 21 03:28:18 2022-01-21 03:27:56,922 INFO > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl [] - > Interrupted while waiting for queue > 2022-01-21T03:28:18.3723661Z Jan 21 03:28:18 java.lang.InterruptedException: > null > 2022-01-21T03:28:18.3724529Z Jan 21 03:28:18 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3725450Z Jan 21 03:28:18 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3726239Z Jan 21 03:28:18 at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3727618Z Jan 21 03:28:18 at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:323) > [hadoop-yarn-client-2.8.5.jar:?] > 2022-01-21T03:28:18.3729147Z Jan 21 03:28:18 2022-01-21 03:27:56,927 WARN > org.apache.hadoop.ipc.Client [] - Failed to > connect to server: 11c5f741db81/172.25.0.2:39121: retries get failed due to > exceeded maximum allowed retries number: 0 > 2022-01-21T03:28:18.3730293Z Jan 21 03:28:18 > java.nio.channels.ClosedByInterruptException: null > 2022-01-21T03:28:18.3730834Z Jan 21 03:28:18 > java.nio.channels.ClosedByInterruptException: null > 2022-01-21T03:28:18.3731499Z Jan 21 03:28:18 at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3732203Z Jan 21 03:28:18 at > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:658) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3733478Z Jan 21 03:28:18 at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) > ~[hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3734470Z Jan 21 03:28:18 at > org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > ~[hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3735432Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3736414Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3737734Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3738853Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client.getConnection(Client.java:1550) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3739752Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client.call(Client.java:1381) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3740638Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client.call(Client.java:1345) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3741589Z Jan 21 03:28:18
[jira] [Commented] (FLINK-25749) YARNSessionFIFOSecuredITCase.testDetachedMode fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-25749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17479930#comment-17479930 ] Till Rohrmann commented on FLINK-25749: --- Another instance: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=29841&view=logs&j=245e1f2e-ba5b-5570-d689-25ae21e5302f&t=d04c9862-880c-52f5-574b-a7a79fef8e0f > YARNSessionFIFOSecuredITCase.testDetachedMode fails on AZP > -- > > Key: FLINK-25749 > URL: https://issues.apache.org/jira/browse/FLINK-25749 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN >Affects Versions: 1.15.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: test-stability > > The test {{YARNSessionFIFOSecuredITCase.testDetachedMode}} fails on AZP: > {code} > 2022-01-21T03:28:18.3712993Z Jan 21 03:28:18 java.lang.AssertionError: > 2022-01-21T03:28:18.3715115Z Jan 21 03:28:18 Found a file > /__w/2/s/flink-yarn-tests/target/flink-yarn-tests-fifo-secured/flink-yarn-tests-fifo-secured-logDir-nm-0_0/application_1642735639007_0002/container_1642735639007_0002_01_01/jobmanager.log > with a prohibited string (one of [Exception, Started > SelectChannelConnector@0.0.0.0:8081]). Excerpts: > 2022-01-21T03:28:18.3716389Z Jan 21 03:28:18 [ > 2022-01-21T03:28:18.3717531Z Jan 21 03:28:18 2022-01-21 03:27:56,921 INFO > org.apache.flink.runtime.resourcemanager.ResourceManagerServiceImpl [] - > Resource manager service is not running. Ignore revoking leadership. > 2022-01-21T03:28:18.3720496Z Jan 21 03:28:18 2022-01-21 03:27:56,922 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Stopped > dispatcher akka.tcp://flink@11c5f741db81:37697/user/rpc/dispatcher_0. > 2022-01-21T03:28:18.3722401Z Jan 21 03:28:18 2022-01-21 03:27:56,922 INFO > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl [] - > Interrupted while waiting for queue > 2022-01-21T03:28:18.3723661Z Jan 21 03:28:18 java.lang.InterruptedException: > null > 2022-01-21T03:28:18.3724529Z Jan 21 03:28:18 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3725450Z Jan 21 03:28:18 at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3726239Z Jan 21 03:28:18 at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3727618Z Jan 21 03:28:18 at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:323) > [hadoop-yarn-client-2.8.5.jar:?] > 2022-01-21T03:28:18.3729147Z Jan 21 03:28:18 2022-01-21 03:27:56,927 WARN > org.apache.hadoop.ipc.Client [] - Failed to > connect to server: 11c5f741db81/172.25.0.2:39121: retries get failed due to > exceeded maximum allowed retries number: 0 > 2022-01-21T03:28:18.3730293Z Jan 21 03:28:18 > java.nio.channels.ClosedByInterruptException: null > 2022-01-21T03:28:18.3730834Z Jan 21 03:28:18 > java.nio.channels.ClosedByInterruptException: null > 2022-01-21T03:28:18.3731499Z Jan 21 03:28:18 at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3732203Z Jan 21 03:28:18 at > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:658) > ~[?:1.8.0_292] > 2022-01-21T03:28:18.3733478Z Jan 21 03:28:18 at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) > ~[hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3734470Z Jan 21 03:28:18 at > org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > ~[hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3735432Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3736414Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3737734Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3738853Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client.getConnection(Client.java:1550) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3739752Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client.call(Client.java:1381) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3740638Z Jan 21 03:28:18 at > org.apache.hadoop.ipc.Client.call(Client.java:1345) > [hadoop-common-2.8.5.jar:?] > 2022-01-21T03:28:18.3741589Z Jan 21 03:28:18 at > o