[jira] [Updated] (FLINK-29618) YARNSessionFIFOSecuredITCase.testDetachedMode timed out in Azure CI

2023-05-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-29618:
---
Labels: pull-request-available starter test-stability  (was: starter 
test-stability)

> YARNSessionFIFOSecuredITCase.testDetachedMode timed out in Azure CI
> ---
>
> Key: FLINK-29618
> URL: https://issues.apache.org/jira/browse/FLINK-29618
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, Tests
>Affects Versions: 1.17.0
>Reporter: Matthias Pohl
>Assignee: Wencong Liu
>Priority: Major
>  Labels: pull-request-available, starter, test-stability
> Attachments: 
> build-20221012.7.YARNSessionFIFOSecuredITCase.testDetachedMode.log
>
>
> We experienced a [build 
> failure|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41931=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=30284]
>  that was caused (exclusively) by 
> {{YARNSessionFIFOSecuredITCase.testDetachedMode}} running into a timeout.
> The test specific logs which were extracted from the build's are attached to 
> this Jira issue.
> JUnit tries to stop the thread running the test but fails to due so because 
> it's interrupting a sleep. The {{InterruptedException}} is not properly 
> handled in 
> [YarnTestBase:744|https://github.com/apache/flink/blob/573ed922346c791760d27653543c2b8df56f51f7/flink-yarn-tests/src/test/java/org/apache/flink/yarn/YarnTestBase.java#L744]
>  (it doesn't forward the exception). Therefore, we only see the warning being 
> logged after 60s:
> {code}
> 11:33:51,124 [ForkJoinPool-1-worker-25] WARN  
> org.apache.flink.yarn.YarnTestBase   [] - Interruped
> java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method) ~[?:1.8.0_292]
> at org.apache.flink.yarn.YarnTestBase.sleep(YarnTestBase.java:716) 
> ~[test-classes/:?]
> at 
> org.apache.flink.yarn.YarnTestBase.startWithArgs(YarnTestBase.java:906) 
> ~[test-classes/:?]
> at 
> org.apache.flink.yarn.YARNSessionFIFOITCase.runDetachedModeTest(YARNSessionFIFOITCase.java:141)
>  ~[test-classes/:?]
> at 
> org.apache.flink.yarn.YARNSessionFIFOSecuredITCase.lambda$testDetachedMode$2(YARNSessionFIFOSecuredITCase.java:173)
>  ~[test-classes/:?]
> at org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:288) 
> ~[test-classes/:?]
> at 
> org.apache.flink.yarn.YARNSessionFIFOSecuredITCase.testDetachedMode(YARNSessionFIFOSecuredITCase.java:160)
>  ~[test-classes/:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_292]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_292]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_292]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
> [...]
> {code}
> The test code itself eventually continues and succeeds (despite the 
> interruption). The job submission takes suspiciously long, though.
> Removing the timeout from the test (as this is the desired approach for tests 
> in general now) should solve this test instability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-29618) YARNSessionFIFOSecuredITCase.testDetachedMode timed out in Azure CI

2023-05-11 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-29618:
--
Description: 
We experienced a [build 
failure|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41931=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=30284]
 that was caused (exclusively) by 
{{YARNSessionFIFOSecuredITCase.testDetachedMode}} running into a timeout.

The test specific logs which were extracted from the build's are attached to 
this Jira issue.

JUnit tries to stop the thread running the test but fails to due so because 
it's interrupting a sleep. The {{InterruptedException}} is not properly handled 
in 
[YarnTestBase:744|https://github.com/apache/flink/blob/573ed922346c791760d27653543c2b8df56f51f7/flink-yarn-tests/src/test/java/org/apache/flink/yarn/YarnTestBase.java#L744]
 (it doesn't forward the exception). Therefore, we only see the warning being 
logged after 60s:
{code}
11:33:51,124 [ForkJoinPool-1-worker-25] WARN  
org.apache.flink.yarn.YarnTestBase   [] - Interruped
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method) ~[?:1.8.0_292]
at org.apache.flink.yarn.YarnTestBase.sleep(YarnTestBase.java:716) 
~[test-classes/:?]
at 
org.apache.flink.yarn.YarnTestBase.startWithArgs(YarnTestBase.java:906) 
~[test-classes/:?]
at 
org.apache.flink.yarn.YARNSessionFIFOITCase.runDetachedModeTest(YARNSessionFIFOITCase.java:141)
 ~[test-classes/:?]
at 
org.apache.flink.yarn.YARNSessionFIFOSecuredITCase.lambda$testDetachedMode$2(YARNSessionFIFOSecuredITCase.java:173)
 ~[test-classes/:?]
at org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:288) 
~[test-classes/:?]
at 
org.apache.flink.yarn.YARNSessionFIFOSecuredITCase.testDetachedMode(YARNSessionFIFOSecuredITCase.java:160)
 ~[test-classes/:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_292]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_292]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_292]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
[...]
{code}

The test code itself eventually continues and succeeds (despite the 
interruption). The job submission takes suspiciously long, though.

Removing the timeout from the test (as this is the desired approach for tests 
in general now) should solve this test instability.


  was:
We experienced a [build 
failure|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41931=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=30284]
 that was caused (exclusively) by 
{{YARNSessionFIFOSecuredITCase.testDetachedMode}} running into a timeout.

The actual issue might be that the test thread failed due to an 
{{InterruptedException}} while waiting for the job to be submitted:
{code}
11:33:51,124 [ForkJoinPool-1-worker-25] WARN  
org.apache.flink.yarn.YarnTestBase   [] - Interruped
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method) ~[?:1.8.0_292]
at org.apache.flink.yarn.YarnTestBase.sleep(YarnTestBase.java:716) 
~[test-classes/:?]
at 
org.apache.flink.yarn.YarnTestBase.startWithArgs(YarnTestBase.java:906) 
~[test-classes/:?]
at 
org.apache.flink.yarn.YARNSessionFIFOITCase.runDetachedModeTest(YARNSessionFIFOITCase.java:141)
 ~[test-classes/:?]
at 
org.apache.flink.yarn.YARNSessionFIFOSecuredITCase.lambda$testDetachedMode$2(YARNSessionFIFOSecuredITCase.java:173)
 ~[test-classes/:?]
at org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:288) 
~[test-classes/:?]
at 
org.apache.flink.yarn.YARNSessionFIFOSecuredITCase.testDetachedMode(YARNSessionFIFOSecuredITCase.java:160)
 ~[test-classes/:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_292]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_292]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_292]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
[...]
{code}

The test specific logs which were extracted from the build's are attached to 
this Jira issue.


> YARNSessionFIFOSecuredITCase.testDetachedMode timed out in Azure CI
> ---
>
> Key: FLINK-29618
> URL: https://issues.apache.org/jira/browse/FLINK-29618
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, Tests
>Affects Versions: 1.17.0
>Reporter: Matthias Pohl
>

[jira] [Updated] (FLINK-29618) YARNSessionFIFOSecuredITCase.testDetachedMode timed out in Azure CI

2023-05-11 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-29618:
--
Labels: starter test-stability  (was: test-stability)

> YARNSessionFIFOSecuredITCase.testDetachedMode timed out in Azure CI
> ---
>
> Key: FLINK-29618
> URL: https://issues.apache.org/jira/browse/FLINK-29618
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, Tests
>Affects Versions: 1.17.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: starter, test-stability
> Attachments: 
> build-20221012.7.YARNSessionFIFOSecuredITCase.testDetachedMode.log
>
>
> We experienced a [build 
> failure|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41931=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=30284]
>  that was caused (exclusively) by 
> {{YARNSessionFIFOSecuredITCase.testDetachedMode}} running into a timeout.
> The actual issue might be that the test thread failed due to an 
> {{InterruptedException}} while waiting for the job to be submitted:
> {code}
> 11:33:51,124 [ForkJoinPool-1-worker-25] WARN  
> org.apache.flink.yarn.YarnTestBase   [] - Interruped
> java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method) ~[?:1.8.0_292]
> at org.apache.flink.yarn.YarnTestBase.sleep(YarnTestBase.java:716) 
> ~[test-classes/:?]
> at 
> org.apache.flink.yarn.YarnTestBase.startWithArgs(YarnTestBase.java:906) 
> ~[test-classes/:?]
> at 
> org.apache.flink.yarn.YARNSessionFIFOITCase.runDetachedModeTest(YARNSessionFIFOITCase.java:141)
>  ~[test-classes/:?]
> at 
> org.apache.flink.yarn.YARNSessionFIFOSecuredITCase.lambda$testDetachedMode$2(YARNSessionFIFOSecuredITCase.java:173)
>  ~[test-classes/:?]
> at org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:288) 
> ~[test-classes/:?]
> at 
> org.apache.flink.yarn.YARNSessionFIFOSecuredITCase.testDetachedMode(YARNSessionFIFOSecuredITCase.java:160)
>  ~[test-classes/:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_292]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_292]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_292]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
> [...]
> {code}
> The test specific logs which were extracted from the build's are attached to 
> this Jira issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-29618) YARNSessionFIFOSecuredITCase.testDetachedMode timed out in Azure CI

2022-10-13 Thread Matthias Pohl (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-29618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Pohl updated FLINK-29618:
--
Attachment: 
build-20221012.7.YARNSessionFIFOSecuredITCase.testDetachedMode.log

> YARNSessionFIFOSecuredITCase.testDetachedMode timed out in Azure CI
> ---
>
> Key: FLINK-29618
> URL: https://issues.apache.org/jira/browse/FLINK-29618
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN, Tests
>Affects Versions: 1.17.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: test-stability
> Attachments: 
> build-20221012.7.YARNSessionFIFOSecuredITCase.testDetachedMode.log
>
>
> We experienced a [build 
> failure|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=41931=logs=fc5181b0-e452-5c8f-68de-1097947f6483=995c650b-6573-581c-9ce6-7ad4cc038461=30284]
>  that was caused (exclusively) by 
> {{YARNSessionFIFOSecuredITCase.testDetachedMode}} running into a timeout.
> The actual issue might be that the test thread failed due to an 
> {{InterruptedException}} while waiting for the job to be submitted:
> {code}
> 11:33:51,124 [ForkJoinPool-1-worker-25] WARN  
> org.apache.flink.yarn.YarnTestBase   [] - Interruped
> java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method) ~[?:1.8.0_292]
> at org.apache.flink.yarn.YarnTestBase.sleep(YarnTestBase.java:716) 
> ~[test-classes/:?]
> at 
> org.apache.flink.yarn.YarnTestBase.startWithArgs(YarnTestBase.java:906) 
> ~[test-classes/:?]
> at 
> org.apache.flink.yarn.YARNSessionFIFOITCase.runDetachedModeTest(YARNSessionFIFOITCase.java:141)
>  ~[test-classes/:?]
> at 
> org.apache.flink.yarn.YARNSessionFIFOSecuredITCase.lambda$testDetachedMode$2(YARNSessionFIFOSecuredITCase.java:173)
>  ~[test-classes/:?]
> at org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:288) 
> ~[test-classes/:?]
> at 
> org.apache.flink.yarn.YARNSessionFIFOSecuredITCase.testDetachedMode(YARNSessionFIFOSecuredITCase.java:160)
>  ~[test-classes/:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_292]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_292]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_292]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_292]
> [...]
> {code}
> The test specific logs which were extracted from the build's are attached to 
> this Jira issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)