[jira] [Commented] (FLINK-35446) FileMergingSnapshotManagerBase throws a NullPointerException

2024-05-27 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849767#comment-17849767
 ] 

Ryan Skraba commented on FLINK-35446:
-

Thanks for the fix!  There were a bunch of failures over the weekend before the 
merge to master:

* 1.20 Default (Java 8) / Test (module: table) 
https://github.com/apache/flink/actions/runs/9249920179/job/25442781056#step:10:12157
* 1.20 Java 8 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9248172120/job/25438340923#step:10:8501
* 1.20 Java 17 / Test (module: table) 
https://github.com/apache/flink/actions/runs/9248172120/job/25438314807#step:10:11974
* 1.20 Java 17 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9248172120/job/25438315031#step:10:8441
* 1.20 Java 21 / Test (module: table) 
https://github.com/apache/flink/actions/runs/9248172120/job/25438306000#step:10:12064
* 1.20 Java 21 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9248172120/job/25438306359#step:10:9072
* 1.20 Hadoop 3.1.3 / Test (module: table) 
https://github.com/apache/flink/actions/runs/9248172120/job/25438381891#step:10:12151
* 1.20 Hadoop 3.1.3 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9248172120/job/25438382250#step:10:8131
* 1.20 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/9248172120/job/25438295648#step:10:12081
* 1.20 Default (Java 8) / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9248089774/job/25438060032#step:10:8040
* 1.20 Default (Java 8) / Test (module: table) 
https://github.com/apache/flink/actions/runs/9244756333/job/25430934260#step:10:11992
* 1.20 Default (Java 8) / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9244756333/job/25430934479#step:10:8471
* 1.20 Java 8 / Test (module: table) 
https://github.com/apache/flink/actions/runs/9239908683/job/25419730553#step:10:11972
* 1.20 Java 11 / Test (module: table) 
https://github.com/apache/flink/actions/runs/9239908683/job/25419746284#step:10:11933
* 1.20 Java 17 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9239908683/job/25419747284#step:10:8437
* 1.20 Default (Java 8) / Test (module: table) 
https://github.com/apache/flink/actions/runs/9236391640/job/25412610305#step:10:12028
* 1.20 Default (Java 8) / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9236391640/job/25412610424#step:10:8615
* 1.20 Java 8 / Test (module: table) 
https://github.com/apache/flink/actions/runs/9232146809/job/25403130654#step:10:11954
* 1.20 Java 17 / Test (module: table) 
https://github.com/apache/flink/actions/runs/9232146809/job/25403143495#step:10:12425
* 1.20 Java 17 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9232146809/job/25403143840#step:10:8431
* 1.20 Java 21 / Test (module: table) 
https://github.com/apache/flink/actions/runs/9232146809/job/25403134721#step:10:11960
* 1.20 Java 21 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9232146809/job/25403134721#step:10:11960
* 1.20 Hadoop 3.1.3 / Test (module: table) 
https://github.com/apache/flink/actions/runs/9232146809/job/25403165764#step:10:12305
* 1.20 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/9232146809/job/25403133340#step:10:12266
* 1.20 AdaptiveScheduler / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9232146809/job/25403133470#step:10:8553

Unfortunately, I think these two failures happened on master **after** the fix 
was merged -- do you think something was missed?  This can definitely be 
verified with the next nightly build!

* 1.20 Default (Java 8) / Test (module: table) 
https://github.com/apache/flink/actions/runs/9250759677/job/25445310702#step:10:12049
* 1.20 Default (Java 8) / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9250759677/job/25445311108#step:10:8510

> FileMergingSnapshotManagerBase throws a NullPointerException
> 
>
> Key: FLINK-35446
> URL: https://issues.apache.org/jira/browse/FLINK-35446
> Project: Flink
>  Issue Type: Bug
>Reporter: Ryan Skraba
>Assignee: Zakelly Lan
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.20.0
>
>
> * 1.20 Java 11 / Test (module: tests) 
> https://github.com/apache/flink/actions/runs/9217608897/job/25360103124#step:10:8641
> {{ResumeCheckpointManuallyITCase.testExternalizedIncrementalRocksDBCheckpointsWithLocalRecoveryZookeeper}}
>  throws a NullPointerException when it tries to restore state handles: 
> {code}
> Error: 02:57:52 02:57:52.551 [ERROR] Tests run: 48, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 268.6 s <<< FAILURE! -- in 
> 

[jira] [Commented] (FLINK-35380) ResumeCheckpointManuallyITCase hanging on tests

2024-05-27 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849765#comment-17849765
 ] 

Ryan Skraba commented on FLINK-35380:
-

* 1.20 Java 21 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9239908683/job/25419736576#step:10:11668
* 1.20 Hadoop 3.1.3 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9239908683/job/25419763729#step:10:12152

> ResumeCheckpointManuallyITCase hanging on tests 
> 
>
> Key: FLINK-35380
> URL: https://issues.apache.org/jira/browse/FLINK-35380
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> * 1.20 Default (Java 8) / Test (module: tests) 
> https://github.com/apache/flink/actions/runs/9105407291/job/25031170942#step:10:11841
>  
> (This is a slightly different error, waiting in a different place than 
> FLINK-28319)
> {code}
> May 16 03:23:58 
> ==
> May 16 03:23:58 Process produced no output for 900 seconds.
> May 16 03:23:58 
> ==
> ... snip until stack trace ...
> ay 16 03:23:58at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> May 16 03:23:58   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> May 16 03:23:58   at 
> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:410)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:378)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:318)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFullRocksDBCheckpointsWithLocalRecoveryStandalone(ResumeCheckpointManuallyITCase.java:133)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28440) EventTimeWindowCheckpointingITCase failed with restore

2024-05-27 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849759#comment-17849759
 ] 

Ryan Skraba commented on FLINK-28440:
-

* 1.19 Java 21 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9232147048/job/25403143624#step:10:8022

> EventTimeWindowCheckpointingITCase failed with restore
> --
>
> Key: FLINK-28440
> URL: https://issues.apache.org/jira/browse/FLINK-28440
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / State Backends
>Affects Versions: 1.16.0, 1.17.0, 1.18.0, 1.19.0
>Reporter: Huang Xingbo
>Assignee: Yanfei Lei
>Priority: Critical
>  Labels: auto-deprioritized-critical, pull-request-available, 
> stale-assigned, test-stability
> Fix For: 1.20.0
>
> Attachments: image-2023-02-01-00-51-54-506.png, 
> image-2023-02-01-01-10-01-521.png, image-2023-02-01-01-19-12-182.png, 
> image-2023-02-01-16-47-23-756.png, image-2023-02-01-16-57-43-889.png, 
> image-2023-02-02-10-52-56-599.png, image-2023-02-03-10-09-07-586.png, 
> image-2023-02-03-12-03-16-155.png, image-2023-02-03-12-03-56-614.png
>
>
> {code:java}
> Caused by: java.lang.Exception: Exception while creating 
> StreamOperatorStateContext.
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:256)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:722)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:698)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:665)
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935)
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:904)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_0a448493b4782967b150582570326227_(2/4) from 
> any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:353)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:165)
>   ... 11 more
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
> /tmp/junit1835099326935900400/junit1113650082510421526/52ee65b7-033f-4429-8ddd-adbe85e27ced
>  (No such file or directory)
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.advance(StateChangelogHandleStreamHandleReader.java:87)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.hasNext(StateChangelogHandleStreamHandleReader.java:69)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.readBackendHandle(ChangelogBackendRestoreOperation.java:96)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.restore(ChangelogBackendRestoreOperation.java:75)
>   at 
> org.apache.flink.state.changelog.ChangelogStateBackend.restore(ChangelogStateBackend.java:92)
>   at 
> org.apache.flink.state.changelog.AbstractChangelogStateBackend.createKeyedStateBackend(AbstractChangelogStateBackend.java:136)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:336)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   ... 13 more
> Caused by: java.io.FileNotFoundException: 
> 

[jira] [Commented] (FLINK-35002) GitHub action request timeout to ArtifactService

2024-05-27 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849763#comment-17849763
 ] 

Ryan Skraba commented on FLINK-35002:
-

* 1.20 Java 11 / Compile 
https://github.com/apache/flink/commit/f860631c523c1d446c0d01046f0fbe6055174dc6/checks/25438061803/logs
* 1.19 Java 17 / Compile 
https://github.com/apache/flink/commit/a450980de65eaead734349ed44452f572e5e329d/checks/25402960967/logs

> GitHub action request timeout  to ArtifactService
> -
>
> Key: FLINK-35002
> URL: https://issues.apache.org/jira/browse/FLINK-35002
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Reporter: Ryan Skraba
>Priority: Major
>  Labels: github-actions, test-stability
>
> A timeout can occur when uploading a successfully built artifact:
>  * [https://github.com/apache/flink/actions/runs/8516411871/job/23325392650]
> {code:java}
> 2024-04-02T02:20:15.6355368Z With the provided path, there will be 1 file 
> uploaded
> 2024-04-02T02:20:15.6360133Z Artifact name is valid!
> 2024-04-02T02:20:15.6362872Z Root directory input is valid!
> 2024-04-02T02:20:20.6975036Z Attempt 1 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 3000 ms...
> 2024-04-02T02:20:28.7084937Z Attempt 2 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 4785 ms...
> 2024-04-02T02:20:38.5015936Z Attempt 3 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 7375 ms...
> 2024-04-02T02:20:50.8901508Z Attempt 4 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 14988 ms...
> 2024-04-02T02:21:10.9028438Z ##[error]Failed to CreateArtifact: Failed to 
> make request after 5 attempts: Request timeout: 
> /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact
> 2024-04-02T02:22:59.9893296Z Post job cleanup.
> 2024-04-02T02:22:59.9958844Z Post job cleanup. {code}
> (This is unlikely to be something we can fix, but we can track it.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35012) ChangelogNormalizeRestoreTest.testRestore failure

2024-05-27 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849764#comment-17849764
 ] 

Ryan Skraba commented on FLINK-35012:
-

* 1.20 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/9239908683/job/25419731096#step:10:10621

> ChangelogNormalizeRestoreTest.testRestore failure
> -
>
> Key: FLINK-35012
> URL: https://issues.apache.org/jira/browse/FLINK-35012
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58716=logs=0c940707-2659-5648-cbe6-a1ad63045f0a=075c2716-8010-5565-fe08-3c4bb45824a4=11921
> {code}
> Apr 03 22:57:43 22:57:43.159 [ERROR] Failures: 
> Apr 03 22:57:43 22:57:43.160 [ERROR]   
> ChangelogNormalizeRestoreTest>RestoreTestBase.testRestore:337 
> Apr 03 22:57:43 Expecting actual:
> Apr 03 22:57:43   ["+I[two, 2, b]",
> Apr 03 22:57:43 "+I[one, 1, a]",
> Apr 03 22:57:43 "+I[three, 3, c]",
> Apr 03 22:57:43 "-U[one, 1, a]",
> Apr 03 22:57:43 "+U[one, 1, aa]",
> Apr 03 22:57:43 "-U[three, 3, c]",
> Apr 03 22:57:43 "+U[three, 3, cc]",
> Apr 03 22:57:43 "-D[two, 2, b]",
> Apr 03 22:57:43 "+I[four, 4, d]",
> Apr 03 22:57:43 "+I[five, 5, e]",
> Apr 03 22:57:43 "-U[four, 4, d]",
> Apr 03 22:57:43 "+U[four, 4, dd]"]
> Apr 03 22:57:43 to contain exactly in any order:
> Apr 03 22:57:43   ["+I[one, 1, a]",
> Apr 03 22:57:43 "+I[two, 2, b]",
> Apr 03 22:57:43 "-U[one, 1, a]",
> Apr 03 22:57:43 "+U[one, 1, aa]",
> Apr 03 22:57:43 "+I[three, 3, c]",
> Apr 03 22:57:43 "-D[two, 2, b]",
> Apr 03 22:57:43 "-U[three, 3, c]",
> Apr 03 22:57:43 "+U[three, 3, cc]",
> Apr 03 22:57:43 "+I[four, 4, d]",
> Apr 03 22:57:43 "+I[five, 5, e]",
> Apr 03 22:57:43 "-U[four, 4, d]",
> Apr 03 22:57:43 "+U[four, 4, dd]",
> Apr 03 22:57:43 "+I[six, 6, f]",
> Apr 03 22:57:43 "-D[six, 6, f]"]
> Apr 03 22:57:43 but could not find the following elements:
> Apr 03 22:57:43   ["+I[six, 6, f]", "-D[six, 6, f]"]
> Apr 03 22:57:43 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34224) ChangelogStorageMetricsTest.testAttemptsPerUpload(ChangelogStorageMetricsTest timed out

2024-05-27 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849760#comment-17849760
 ] 

Ryan Skraba commented on FLINK-34224:
-

* 1.20 Hadoop 3.1.3 / Test (module: core) 
https://github.com/apache/flink/actions/runs/9239908683/job/25419763061#step:10:12699

> ChangelogStorageMetricsTest.testAttemptsPerUpload(ChangelogStorageMetricsTest 
> timed out
> ---
>
> Key: FLINK-34224
> URL: https://issues.apache.org/jira/browse/FLINK-34224
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.19.0, 1.18.1
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions, test-stability
>
> The timeout appeared in the GitHub Actions workflow (currently in test phase; 
> [FLIP-396|https://cwiki.apache.org/confluence/display/FLINK/FLIP-396%3A+Trial+to+test+GitHub+Actions+as+an+alternative+for+Flink%27s+current+Azure+CI+infrastructure]):
> https://github.com/XComp/flink/actions/runs/7632434859/job/20793613726#step:10:11040
> {code}
> Jan 24 01:38:36 "ForkJoinPool-1-worker-1" #16 daemon prio=5 os_prio=0 
> tid=0x7f3b200ae800 nid=0x406e3 waiting on condition [0x7f3b1ba0e000]
> Jan 24 01:38:36java.lang.Thread.State: WAITING (parking)
> Jan 24 01:38:36   at sun.misc.Unsafe.park(Native Method)
> Jan 24 01:38:36   - parking to wait for  <0xdfbbb358> (a 
> java.util.concurrent.CompletableFuture$Signaller)
> Jan 24 01:38:36   at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> Jan 24 01:38:36   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
> Jan 24 01:38:36   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3313)
> Jan 24 01:38:36   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
> Jan 24 01:38:36   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> Jan 24 01:38:36   at 
> org.apache.flink.changelog.fs.ChangelogStorageMetricsTest.testAttemptsPerUpload(ChangelogStorageMetricsTest.java:251)
> Jan 24 01:38:36   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34645) StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.testFinishBundleTriggeredByCount fails

2024-05-27 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849762#comment-17849762
 ] 

Ryan Skraba commented on FLINK-34645:
-

* 1.18 Hadoop 3.1.3 / Test (module: misc) 
https://github.com/apache/flink/actions/runs/9232146944

> StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.testFinishBundleTriggeredByCount
>  fails
> 
>
> Key: FLINK-34645
> URL: https://issues.apache.org/jira/browse/FLINK-34645
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.18.1
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions, test-stability
>
> {code}
> Error: 02:27:17 02:27:17.025 [ERROR] Tests run: 3, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 0.658 s <<< FAILURE! - in 
> org.apache.flink.table.runtime.operators.python.aggregate.arrow.stream.StreamArrowPythonGroupWindowAggregateFunctionOperatorTest
> Error: 02:27:17 02:27:17.025 [ERROR] 
> org.apache.flink.table.runtime.operators.python.aggregate.arrow.stream.StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.testFinishBundleTriggeredByCount
>   Time elapsed: 0.3 s  <<< FAILURE!
> Mar 09 02:27:17 java.lang.AssertionError: 
> Mar 09 02:27:17 
> Mar 09 02:27:17 Expected size: 8 but was: 6 in:
> Mar 09 02:27:17 [Record @ (undef) : 
> +I(c1,0,1969-12-31T23:59:55,1970-01-01T00:00:05),
> Mar 09 02:27:17 Record @ (undef) : 
> +I(c2,3,1969-12-31T23:59:55,1970-01-01T00:00:05),
> Mar 09 02:27:17 Record @ (undef) : 
> +I(c2,3,1970-01-01T00:00,1970-01-01T00:00:10),
> Mar 09 02:27:17 Record @ (undef) : 
> +I(c1,0,1970-01-01T00:00,1970-01-01T00:00:10),
> Mar 09 02:27:17 Watermark @ 1,
> Mar 09 02:27:17 Watermark @ 2]
> Mar 09 02:27:17   at 
> org.apache.flink.table.runtime.util.RowDataHarnessAssertor.assertOutputEquals(RowDataHarnessAssertor.java:110)
> Mar 09 02:27:17   at 
> org.apache.flink.table.runtime.util.RowDataHarnessAssertor.assertOutputEquals(RowDataHarnessAssertor.java:70)
> Mar 09 02:27:17   at 
> org.apache.flink.table.runtime.operators.python.aggregate.arrow.ArrowPythonAggregateFunctionOperatorTestBase.assertOutputEquals(ArrowPythonAggregateFunctionOperatorTestBase.java:62)
> Mar 09 02:27:17   at 
> org.apache.flink.table.runtime.operators.python.aggregate.arrow.stream.StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.testFinishBundleTriggeredByCount(StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.java:326)
> Mar 09 02:27:17   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34227) Job doesn't disconnect from ResourceManager

2024-05-27 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849761#comment-17849761
 ] 

Ryan Skraba commented on FLINK-34227:
-

* 1.18 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/9248172203/job/25438330034#step:10:15163
* 1.18 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/9239908314/job/25419753266#step:10:12055

> Job doesn't disconnect from ResourceManager
> ---
>
> Key: FLINK-34227
> URL: https://issues.apache.org/jira/browse/FLINK-34227
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.18.1
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: github-actions, pull-request-available, test-stability
> Attachments: FLINK-34227.7e7d69daebb438b8d03b7392c9c55115.log, 
> FLINK-34227.log
>
>
> https://github.com/XComp/flink/actions/runs/7634987973/job/20800205972#step:10:14557
> {code}
> [...]
> "main" #1 prio=5 os_prio=0 tid=0x7f4b7000 nid=0x24ec0 waiting on 
> condition [0x7fccce1eb000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xbdd52618> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2131)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2099)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2077)
>   at 
> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:876)
>   at 
> org.apache.flink.table.planner.runtime.stream.sql.WindowDistinctAggregateITCase.testHopWindow_Cube(WindowDistinctAggregateITCase.scala:550)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-18476) PythonEnvUtilsTest#testStartPythonProcess fails

2024-05-27 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849758#comment-17849758
 ] 

Ryan Skraba commented on FLINK-18476:
-

* 1.20 Java 21 / Test (module: misc) 
https://github.com/apache/flink/actions/runs/9232146809/job/25403134721#step:10:11960

> PythonEnvUtilsTest#testStartPythonProcess fails
> ---
>
> Key: FLINK-18476
> URL: https://issues.apache.org/jira/browse/FLINK-18476
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Tests
>Affects Versions: 1.11.0, 1.15.3, 1.18.0, 1.19.0, 1.20.0
>Reporter: Dawid Wysakowicz
>Priority: Major
>  Labels: auto-deprioritized-major, auto-deprioritized-minor, 
> test-stability
>
> The 
> {{org.apache.flink.client.python.PythonEnvUtilsTest#testStartPythonProcess}} 
> failed in my local environment as it assumes the environment has 
> {{/usr/bin/python}}. 
> I don't know exactly how did I get python in Ubuntu 20.04, but I have only 
> alias for {{python = python3}}. Therefore the tests fails.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35446) FileMergingSnapshotManagerBase throws a NullPointerException

2024-05-24 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849285#comment-17849285
 ] 

Ryan Skraba commented on FLINK-35446:
-

[~lijinzhong] or [~zakelly] Do you think this needs a similar fix as 
FLINK-35382 ? 

> FileMergingSnapshotManagerBase throws a NullPointerException
> 
>
> Key: FLINK-35446
> URL: https://issues.apache.org/jira/browse/FLINK-35446
> Project: Flink
>  Issue Type: Bug
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> * 1.20 Java 11 / Test (module: tests) 
> https://github.com/apache/flink/actions/runs/9217608897/job/25360103124#step:10:8641
> {{ResumeCheckpointManuallyITCase.testExternalizedIncrementalRocksDBCheckpointsWithLocalRecoveryZookeeper}}
>  throws a NullPointerException when it tries to restore state handles: 
> {code}
> Error: 02:57:52 02:57:52.551 [ERROR] Tests run: 48, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 268.6 s <<< FAILURE! -- in 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase
> Error: 02:57:52 02:57:52.551 [ERROR] 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedIncrementalRocksDBCheckpointsWithLocalRecoveryZookeeper[RestoreMode
>  = CLAIM] -- Time elapsed: 3.145 s <<< ERROR!
> May 24 02:57:52 org.apache.flink.runtime.JobException: Recovery is suppressed 
> by NoRestartBackoffTimeStrategy
> May 24 02:57:52   at 
> org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:219)
> May 24 02:57:52   at 
> org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailureAndReport(ExecutionFailureHandler.java:166)
> May 24 02:57:52   at 
> org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:121)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.recordTaskFailure(DefaultScheduler.java:279)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:270)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.onTaskFailed(DefaultScheduler.java:263)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.onTaskExecutionStateUpdate(SchedulerBase.java:788)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:765)
> May 24 02:57:52   at 
> org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:83)
> May 24 02:57:52   at 
> org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:496)
> May 24 02:57:52   at 
> jdk.internal.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
> May 24 02:57:52   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> May 24 02:57:52   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:566)
> May 24 02:57:52   at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRpcInvocation$1(PekkoRpcActor.java:318)
> May 24 02:57:52   at 
> org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
> May 24 02:57:52   at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcInvocation(PekkoRpcActor.java:316)
> May 24 02:57:52   at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:229)
> May 24 02:57:52   at 
> org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:88)
> May 24 02:57:52   at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:174)
> May 24 02:57:52   at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33)
> May 24 02:57:52   at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29)
> May 24 02:57:52   at 
> scala.PartialFunction.applyOrElse(PartialFunction.scala:127)
> May 24 02:57:52   at 
> scala.PartialFunction.applyOrElse$(PartialFunction.scala:126)
> May 24 02:57:52   at 
> org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29)
> May 24 02:57:52   at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175)
> May 24 02:57:52   at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
> May 24 02:57:52   at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
> May 24 02:57:52   at 
> org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547)
> May 24 02:57:52   at 
> 

[jira] [Commented] (FLINK-35446) FileMergingSnapshotManagerBase throws a NullPointerException

2024-05-24 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849284#comment-17849284
 ] 

Ryan Skraba commented on FLINK-35446:
-

* 1.20 Java 11 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9217608897/job/25360103124#step:10:8641
* 1.20 Default (Java 8) / Test (module: table) 
https://github.com/apache/flink/actions/runs/9219075449/job/25363874486#step:10:11847
 {{PruneAggregateCallITCase.testNoneEmptyGroupKey}}
* 1.20 Default (Java 8) / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9219075449/job/25363874825#step:10:8005

The last one is different than the others: 
{code}
Error: 05:48:38 05:48:38.790 [ERROR] Tests run: 11, Failures: 1, Errors: 0, 
Skipped: 0, Time elapsed: 12.78 s <<< FAILURE! -- in 
org.apache.flink.test.classloading.ClassLoaderITCase
Error: 05:48:38 05:48:38.790 [ERROR] 
org.apache.flink.test.classloading.ClassLoaderITCase.testCheckpointedStreamingClassloaderJobWithCustomClassLoader
 -- Time elapsed: 2.492 s <<< FAILURE!
May 24 05:48:38 org.assertj.core.error.AssertJMultipleFailuresError: 
May 24 05:48:38 
May 24 05:48:38 Multiple Failures (1 failure)
May 24 05:48:38 -- failure 1 --
May 24 05:48:38 [Any cause is instance of class 'class 
org.apache.flink.util.SerializedThrowable' and contains message 
'org.apache.flink.test.classloading.jar.CheckpointedStreamingProgram$SuccessException']
 
May 24 05:48:38 Expecting any element of:
May 24 05:48:38   [org.apache.flink.client.program.ProgramInvocationException: 
The main method caused an error: Job execution failed.
May 24 05:48:38 at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:373)
May 24 05:48:38 at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:223)
May 24 05:48:38 at 
org.apache.flink.test.classloading.ClassLoaderITCase.lambda$testCheckpointedStreamingClassloaderJobWithCustomClassLoader$1(ClassLoaderITCase.java:260)
May 24 05:48:38 ...(54 remaining lines not displayed - this can be 
changed with Assertions.setMaxStackTraceElementsDisplayed),
May 24 05:48:38 org.apache.flink.runtime.client.JobExecutionException: Job 
execution failed.
May 24 05:48:38 at 
org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
May 24 05:48:38 at 
org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
May 24 05:48:38 at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
May 24 05:48:38 ...(45 remaining lines not displayed - this can be 
changed with Assertions.setMaxStackTraceElementsDisplayed),
May 24 05:48:38 org.apache.flink.runtime.JobException: Recovery is 
suppressed by FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=1, 
backoffTimeMS=100)
May 24 05:48:38 at 
org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:219)
May 24 05:48:38 at 
org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailureAndReport(ExecutionFailureHandler.java:166)
May 24 05:48:38 at 
org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:121)
May 24 05:48:38 ...(36 remaining lines not displayed - this can be 
changed with Assertions.setMaxStackTraceElementsDisplayed),
May 24 05:48:38 java.lang.NullPointerException
May 24 05:48:38 at 
org.apache.flink.runtime.checkpoint.filemerging.FileMergingSnapshotManagerBase.isManagedByFileMergingManager(FileMergingSnapshotManagerBase.java:733)
May 24 05:48:38 at 
org.apache.flink.runtime.checkpoint.filemerging.FileMergingSnapshotManagerBase.lambda$null$4(FileMergingSnapshotManagerBase.java:687)
May 24 05:48:38 at java.util.HashMap.computeIfAbsent(HashMap.java:1128)
May 24 05:48:38 ...(41 remaining lines not displayed - this can be 
changed with Assertions.setMaxStackTraceElementsDisplayed)]
May 24 05:48:38 to satisfy the given assertions requirements but none did:
May 24 05:48:38 
May 24 05:48:38 org.apache.flink.client.program.ProgramInvocationException: The 
main method caused an error: Job execution failed.
May 24 05:48:38 at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:373)
May 24 05:48:38 at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:223)
May 24 05:48:38 at 
org.apache.flink.test.classloading.ClassLoaderITCase.lambda$testCheckpointedStreamingClassloaderJobWithCustomClassLoader$1(ClassLoaderITCase.java:260)
May 24 05:48:38 ...(54 remaining lines not displayed - this can be 
changed with 

[jira] [Commented] (FLINK-35342) MaterializedTableStatementITCase test can check for wrong status

2024-05-24 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849283#comment-17849283
 ] 

Ryan Skraba commented on FLINK-35342:
-

* 1.20 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/9217608897/job/25360076574#step:10:12483

> MaterializedTableStatementITCase test can check for wrong status
> 
>
> Key: FLINK-35342
> URL: https://issues.apache.org/jira/browse/FLINK-35342
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Assignee: Feng Jin
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> * 1.20 AdaptiveScheduler / Test (module: table) 
> https://github.com/apache/flink/actions/runs/9056197319/job/24879135605#step:10:12490
>  
> It looks like 
> {{MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume}}
>  can be flaky, where the expected status is not yet RUNNING:
> {code}
> Error: 03:24:03 03:24:03.902 [ERROR] Tests run: 6, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 26.78 s <<< FAILURE! -- in 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase
> Error: 03:24:03 03:24:03.902 [ERROR] 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(Path,
>  RestClusterClient) -- Time elapsed: 3.850 s <<< FAILURE!
> May 13 03:24:03 org.opentest4j.AssertionFailedError: 
> May 13 03:24:03 
> May 13 03:24:03 expected: "RUNNING"
> May 13 03:24:03  but was: "CREATED"
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> May 13 03:24:03   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> May 13 03:24:03   at 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(MaterializedTableStatementITCase.java:650)
> May 13 03:24:03   at java.lang.reflect.Method.invoke(Method.java:498)
> May 13 03:24:03   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> May 13 03:24:03 
> May 13 03:24:04 03:24:04.270 [INFO] 
> May 13 03:24:04 03:24:04.270 [INFO] Results:
> May 13 03:24:04 03:24:04.270 [INFO] 
> Error: 03:24:04 03:24:04.270 [ERROR] Failures: 
> Error: 03:24:04 03:24:04.271 [ERROR]   
> MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume:650
>  
> May 13 03:24:04 expected: "RUNNING"
> May 13 03:24:04  but was: "CREATED"
> May 13 03:24:04 03:24:04.271 [INFO] 
> Error: 03:24:04 03:24:04.271 [ERROR] Tests run: 82, Failures: 1, Errors: 0, 
> Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28440) EventTimeWindowCheckpointingITCase failed with restore

2024-05-24 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849282#comment-17849282
 ] 

Ryan Skraba commented on FLINK-28440:
-

* 1.19 Hadoop 3.1.3 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9217608890/job/25360146799#step:10:8157

> EventTimeWindowCheckpointingITCase failed with restore
> --
>
> Key: FLINK-28440
> URL: https://issues.apache.org/jira/browse/FLINK-28440
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / State Backends
>Affects Versions: 1.16.0, 1.17.0, 1.18.0, 1.19.0
>Reporter: Huang Xingbo
>Assignee: Yanfei Lei
>Priority: Critical
>  Labels: auto-deprioritized-critical, pull-request-available, 
> stale-assigned, test-stability
> Fix For: 1.20.0
>
> Attachments: image-2023-02-01-00-51-54-506.png, 
> image-2023-02-01-01-10-01-521.png, image-2023-02-01-01-19-12-182.png, 
> image-2023-02-01-16-47-23-756.png, image-2023-02-01-16-57-43-889.png, 
> image-2023-02-02-10-52-56-599.png, image-2023-02-03-10-09-07-586.png, 
> image-2023-02-03-12-03-16-155.png, image-2023-02-03-12-03-56-614.png
>
>
> {code:java}
> Caused by: java.lang.Exception: Exception while creating 
> StreamOperatorStateContext.
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:256)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:722)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:698)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:665)
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935)
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:904)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_0a448493b4782967b150582570326227_(2/4) from 
> any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:353)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:165)
>   ... 11 more
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
> /tmp/junit1835099326935900400/junit1113650082510421526/52ee65b7-033f-4429-8ddd-adbe85e27ced
>  (No such file or directory)
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.advance(StateChangelogHandleStreamHandleReader.java:87)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.hasNext(StateChangelogHandleStreamHandleReader.java:69)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.readBackendHandle(ChangelogBackendRestoreOperation.java:96)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.restore(ChangelogBackendRestoreOperation.java:75)
>   at 
> org.apache.flink.state.changelog.ChangelogStateBackend.restore(ChangelogStateBackend.java:92)
>   at 
> org.apache.flink.state.changelog.AbstractChangelogStateBackend.createKeyedStateBackend(AbstractChangelogStateBackend.java:136)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:336)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   ... 13 more
> Caused by: java.io.FileNotFoundException: 
> 

[jira] [Created] (FLINK-35446) FileMergingSnapshotManagerBase throws a NullPointerException

2024-05-24 Thread Ryan Skraba (Jira)
Ryan Skraba created FLINK-35446:
---

 Summary: FileMergingSnapshotManagerBase throws a 
NullPointerException
 Key: FLINK-35446
 URL: https://issues.apache.org/jira/browse/FLINK-35446
 Project: Flink
  Issue Type: Bug
Reporter: Ryan Skraba


* 1.20 Java 11 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9217608897/job/25360103124#step:10:8641

{{ResumeCheckpointManuallyITCase.testExternalizedIncrementalRocksDBCheckpointsWithLocalRecoveryZookeeper}}
 throws a NullPointerException when it tries to restore state handles: 

{code}
Error: 02:57:52 02:57:52.551 [ERROR] Tests run: 48, Failures: 0, Errors: 1, 
Skipped: 0, Time elapsed: 268.6 s <<< FAILURE! -- in 
org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase
Error: 02:57:52 02:57:52.551 [ERROR] 
org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedIncrementalRocksDBCheckpointsWithLocalRecoveryZookeeper[RestoreMode
 = CLAIM] -- Time elapsed: 3.145 s <<< ERROR!
May 24 02:57:52 org.apache.flink.runtime.JobException: Recovery is suppressed 
by NoRestartBackoffTimeStrategy
May 24 02:57:52 at 
org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:219)
May 24 02:57:52 at 
org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailureAndReport(ExecutionFailureHandler.java:166)
May 24 02:57:52 at 
org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:121)
May 24 02:57:52 at 
org.apache.flink.runtime.scheduler.DefaultScheduler.recordTaskFailure(DefaultScheduler.java:279)
May 24 02:57:52 at 
org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:270)
May 24 02:57:52 at 
org.apache.flink.runtime.scheduler.DefaultScheduler.onTaskFailed(DefaultScheduler.java:263)
May 24 02:57:52 at 
org.apache.flink.runtime.scheduler.SchedulerBase.onTaskExecutionStateUpdate(SchedulerBase.java:788)
May 24 02:57:52 at 
org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:765)
May 24 02:57:52 at 
org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:83)
May 24 02:57:52 at 
org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:496)
May 24 02:57:52 at 
jdk.internal.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
May 24 02:57:52 at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
May 24 02:57:52 at 
java.base/java.lang.reflect.Method.invoke(Method.java:566)
May 24 02:57:52 at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRpcInvocation$1(PekkoRpcActor.java:318)
May 24 02:57:52 at 
org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
May 24 02:57:52 at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcInvocation(PekkoRpcActor.java:316)
May 24 02:57:52 at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:229)
May 24 02:57:52 at 
org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:88)
May 24 02:57:52 at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:174)
May 24 02:57:52 at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33)
May 24 02:57:52 at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29)
May 24 02:57:52 at 
scala.PartialFunction.applyOrElse(PartialFunction.scala:127)
May 24 02:57:52 at 
scala.PartialFunction.applyOrElse$(PartialFunction.scala:126)
May 24 02:57:52 at 
org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29)
May 24 02:57:52 at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175)
May 24 02:57:52 at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
May 24 02:57:52 at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
May 24 02:57:52 at 
org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547)
May 24 02:57:52 at 
org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545)
May 24 02:57:52 at 
org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229)
May 24 02:57:52 at 
org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590)
May 24 02:57:52 at 
org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557)
May 24 02:57:52 at 
org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280)
May 24 02:57:52 at 

[jira] [Created] (FLINK-35438) SourceCoordinatorTest.testErrorThrownFromSplitEnumerator fails on wrong error

2024-05-23 Thread Ryan Skraba (Jira)
Ryan Skraba created FLINK-35438:
---

 Summary: SourceCoordinatorTest.testErrorThrownFromSplitEnumerator 
fails on wrong error
 Key: FLINK-35438
 URL: https://issues.apache.org/jira/browse/FLINK-35438
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.18.2
Reporter: Ryan Skraba


* 1.18 Java 11 / Test (module: core) 
https://github.com/apache/flink/actions/runs/9201159842/job/25309197630#step:10:7375

We expect to see an artificial {{Error("Test Error")}} being reported in the 
test as the cause of a job failure, but the reported job failure is null:

{code}
Error: 02:32:31 02:32:31.950 [ERROR] Tests run: 18, Failures: 1, Errors: 0, 
Skipped: 0, Time elapsed: 0.187 s <<< FAILURE! - in 
org.apache.flink.runtime.source.coordinator.SourceCoordinatorTest
Error: 02:32:31 02:32:31.950 [ERROR] 
org.apache.flink.runtime.source.coordinator.SourceCoordinatorTest.testErrorThrownFromSplitEnumerator
  Time elapsed: 0.01 s  <<< FAILURE!
May 23 02:32:31 org.opentest4j.AssertionFailedError: 
May 23 02:32:31 
May 23 02:32:31 expected: 
May 23 02:32:31   java.lang.Error: Test Error
May 23 02:32:31 at 
org.apache.flink.runtime.source.coordinator.SourceCoordinatorTest.testErrorThrownFromSplitEnumerator(SourceCoordinatorTest.java:296)
May 23 02:32:31 at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
May 23 02:32:31 at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
May 23 02:32:31 ...(57 remaining lines not displayed - this can be 
changed with Assertions.setMaxStackTraceElementsDisplayed)
May 23 02:32:31  but was: 
May 23 02:32:31   null
May 23 02:32:31 at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
May 23 02:32:31 at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
May 23 02:32:31 at 
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
May 23 02:32:31 at 
org.apache.flink.runtime.source.coordinator.SourceCoordinatorTest.testErrorThrownFromSplitEnumerator(SourceCoordinatorTest.java:322)
May 23 02:32:31 at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
May 23 02:32:31 at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
May 23 02:32:31 at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
May 23 02:32:31 at 
java.base/java.lang.reflect.Method.invoke(Method.java:566)
May 23 02:32:31 at 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727)
May 23 02:32:31 at 
org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
May 23 02:32:31 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
May 23 02:32:31 at 
org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
May 23 02:32:31 at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
May 23 02:32:31 at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
May 23 02:32:31 at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
May 23 02:32:31 at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
May 23 02:32:31 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
May 23 02:32:31 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
May 23 02:32:31 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
May 23 02:32:31 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
May 23 02:32:31 at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
May 23 02:32:31 at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
{code}

This looks like it's a multithreading error with the test 
{{MockOperatorCoordinatorContext}}, perhaps where {{isJobFailure}} can return 
true before the reason has been populated. I couldn't reproduce it after 
running it 1M times.




--
This message was sent by 

[jira] [Commented] (FLINK-35380) ResumeCheckpointManuallyITCase hanging on tests

2024-05-23 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848946#comment-17848946
 ] 

Ryan Skraba commented on FLINK-35380:
-

* 1.20 Hadoop 3.1.3 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9201159914/job/25309205615#step:10:12158

> ResumeCheckpointManuallyITCase hanging on tests 
> 
>
> Key: FLINK-35380
> URL: https://issues.apache.org/jira/browse/FLINK-35380
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> * 1.20 Default (Java 8) / Test (module: tests) 
> https://github.com/apache/flink/actions/runs/9105407291/job/25031170942#step:10:11841
>  
> (This is a slightly different error, waiting in a different place than 
> FLINK-28319)
> {code}
> May 16 03:23:58 
> ==
> May 16 03:23:58 Process produced no output for 900 seconds.
> May 16 03:23:58 
> ==
> ... snip until stack trace ...
> ay 16 03:23:58at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> May 16 03:23:58   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> May 16 03:23:58   at 
> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:410)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:378)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:318)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFullRocksDBCheckpointsWithLocalRecoveryStandalone(ResumeCheckpointManuallyITCase.java:133)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35342) MaterializedTableStatementITCase test can check for wrong status

2024-05-23 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848945#comment-17848945
 ] 

Ryan Skraba commented on FLINK-35342:
-

* 1.20 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/9201159914/job/25309139160#step:10:12492

> MaterializedTableStatementITCase test can check for wrong status
> 
>
> Key: FLINK-35342
> URL: https://issues.apache.org/jira/browse/FLINK-35342
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Assignee: Feng Jin
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> * 1.20 AdaptiveScheduler / Test (module: table) 
> https://github.com/apache/flink/actions/runs/9056197319/job/24879135605#step:10:12490
>  
> It looks like 
> {{MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume}}
>  can be flaky, where the expected status is not yet RUNNING:
> {code}
> Error: 03:24:03 03:24:03.902 [ERROR] Tests run: 6, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 26.78 s <<< FAILURE! -- in 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase
> Error: 03:24:03 03:24:03.902 [ERROR] 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(Path,
>  RestClusterClient) -- Time elapsed: 3.850 s <<< FAILURE!
> May 13 03:24:03 org.opentest4j.AssertionFailedError: 
> May 13 03:24:03 
> May 13 03:24:03 expected: "RUNNING"
> May 13 03:24:03  but was: "CREATED"
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> May 13 03:24:03   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> May 13 03:24:03   at 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(MaterializedTableStatementITCase.java:650)
> May 13 03:24:03   at java.lang.reflect.Method.invoke(Method.java:498)
> May 13 03:24:03   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> May 13 03:24:03 
> May 13 03:24:04 03:24:04.270 [INFO] 
> May 13 03:24:04 03:24:04.270 [INFO] Results:
> May 13 03:24:04 03:24:04.270 [INFO] 
> Error: 03:24:04 03:24:04.270 [ERROR] Failures: 
> Error: 03:24:04 03:24:04.271 [ERROR]   
> MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume:650
>  
> May 13 03:24:04 expected: "RUNNING"
> May 13 03:24:04  but was: "CREATED"
> May 13 03:24:04 03:24:04.271 [INFO] 
> Error: 03:24:04 03:24:04.271 [ERROR] Tests run: 82, Failures: 1, Errors: 0, 
> Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP

2024-05-23 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848944#comment-17848944
 ] 

Ryan Skraba commented on FLINK-33186:
-

* 1.19 Default (Java 8) / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9201085836/job/25309084773#step:10:8471

>  CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished 
> fails on AZP
> -
>
> Key: FLINK-33186
> URL: https://issues.apache.org/jira/browse/FLINK-33186
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.19.0, 1.18.1
>Reporter: Sergey Nuyanzin
>Assignee: Jiang Xin
>Priority: Critical
>  Labels: test-stability
>
> This build 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762
> fails as
> {noformat}
> Sep 28 01:23:43 Caused by: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Task local 
> checkpoint failure.
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550)
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248)
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235)
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817)
> Sep 28 01:23:43   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> Sep 28 01:23:43   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> Sep 28 01:23:43   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> Sep 28 01:23:43   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> Sep 28 01:23:43   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> Sep 28 01:23:43   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> Sep 28 01:23:43   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28440) EventTimeWindowCheckpointingITCase failed with restore

2024-05-23 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848943#comment-17848943
 ] 

Ryan Skraba commented on FLINK-28440:
-

* 1.19 Java 21 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9201159696/job/25309170552#step:10:8003

> EventTimeWindowCheckpointingITCase failed with restore
> --
>
> Key: FLINK-28440
> URL: https://issues.apache.org/jira/browse/FLINK-28440
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / State Backends
>Affects Versions: 1.16.0, 1.17.0, 1.18.0, 1.19.0
>Reporter: Huang Xingbo
>Assignee: Yanfei Lei
>Priority: Critical
>  Labels: auto-deprioritized-critical, pull-request-available, 
> stale-assigned, test-stability
> Fix For: 1.20.0
>
> Attachments: image-2023-02-01-00-51-54-506.png, 
> image-2023-02-01-01-10-01-521.png, image-2023-02-01-01-19-12-182.png, 
> image-2023-02-01-16-47-23-756.png, image-2023-02-01-16-57-43-889.png, 
> image-2023-02-02-10-52-56-599.png, image-2023-02-03-10-09-07-586.png, 
> image-2023-02-03-12-03-16-155.png, image-2023-02-03-12-03-56-614.png
>
>
> {code:java}
> Caused by: java.lang.Exception: Exception while creating 
> StreamOperatorStateContext.
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:256)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:722)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:698)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:665)
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935)
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:904)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_0a448493b4782967b150582570326227_(2/4) from 
> any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:353)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:165)
>   ... 11 more
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
> /tmp/junit1835099326935900400/junit1113650082510421526/52ee65b7-033f-4429-8ddd-adbe85e27ced
>  (No such file or directory)
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.advance(StateChangelogHandleStreamHandleReader.java:87)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.hasNext(StateChangelogHandleStreamHandleReader.java:69)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.readBackendHandle(ChangelogBackendRestoreOperation.java:96)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.restore(ChangelogBackendRestoreOperation.java:75)
>   at 
> org.apache.flink.state.changelog.ChangelogStateBackend.restore(ChangelogStateBackend.java:92)
>   at 
> org.apache.flink.state.changelog.AbstractChangelogStateBackend.createKeyedStateBackend(AbstractChangelogStateBackend.java:136)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:336)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   ... 13 more
> Caused by: java.io.FileNotFoundException: 
> 

[jira] [Commented] (FLINK-35428) WindowJoinITCase#testInnerJoin failed on AZP as NPE

2024-05-23 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848854#comment-17848854
 ] 

Ryan Skraba commented on FLINK-35428:
-

This could be a duplicate of FLINK-35418 (which occurred on 
EventTimeWindowCheckpointingITCase).

> WindowJoinITCase#testInnerJoin failed on AZP as NPE
> ---
>
> Key: FLINK-35428
> URL: https://issues.apache.org/jira/browse/FLINK-35428
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Affects Versions: 1.20.0
>Reporter: Weijie Guo
>Priority: Major
>
> {code:java}
> Caused by: java.lang.NullPointerException
> May 23 02:00:33   at 
> org.apache.flink.runtime.checkpoint.filemerging.PhysicalFile.deleteIfNecessary(PhysicalFile.java:155)
> May 23 02:00:33   at 
> org.apache.flink.runtime.checkpoint.filemerging.PhysicalFile.decRefCount(PhysicalFile.java:141)
> May 23 02:00:33   at 
> org.apache.flink.runtime.checkpoint.filemerging.LogicalFile.discardWithCheckpointId(LogicalFile.java:118)
> May 23 02:00:33   at 
> org.apache.flink.runtime.checkpoint.filemerging.FileMergingSnapshotManagerBase.discardSingleLogicalFile(FileMergingSnapshotManagerBase.java:574)
> May 23 02:00:33   at 
> org.apache.flink.runtime.checkpoint.filemerging.FileMergingSnapshotManagerBase.discardLogicalFiles(FileMergingSnapshotManagerBase.java:588)
> May 23 02:00:33   at 
> org.apache.flink.runtime.checkpoint.filemerging.FileMergingSnapshotManagerBase.notifyCheckpointAborted(FileMergingSnapshotManagerBase.java:490)
> May 23 02:00:33   at 
> org.apache.flink.runtime.checkpoint.filemerging.WithinCheckpointFileMergingSnapshotManager.notifyCheckpointAborted(WithinCheckpointFileMergingSnapshotManager.java:61)
> May 23 02:00:33   at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyFileMergingSnapshotManagerCheckpoint(SubtaskCheckpointCoordinatorImpl.java:505)
> May 23 02:00:33   at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpoint(SubtaskCheckpointCoordinatorImpl.java:490)
> May 23 02:00:33   at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpointAborted(SubtaskCheckpointCoordinatorImpl.java:414)
> May 23 02:00:33   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointAbortAsync$21(StreamTask.java:1513)
> May 23 02:00:33   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointOperation$23(StreamTask.java:1536)
> May 23 02:00:33   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
> May 23 02:00:33   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
> May 23 02:00:33   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMail(MailboxProcessor.java:398)
> May 23 02:00:33   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:367)
> May 23 02:00:33   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:352)
> May 23 02:00:33   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:229)
> May 23 02:00:33   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:968)
> May 23 02:00:33   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:917)
> May 23 02:00:33   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:966)
> May 23 02:00:33   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:945)
> May 23 02:00:33   at 
> org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:759)
> May 23 02:00:33   at 
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:573)
> May 23 02:00:33   at java.lang.Thread.run(Thread.java:748)
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59751=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11944



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35342) MaterializedTableStatementITCase test can check for wrong status

2024-05-22 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848603#comment-17848603
 ] 

Ryan Skraba commented on FLINK-35342:
-

* 1.20 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/9184288079/job/25256599953#step:10:12493

> MaterializedTableStatementITCase test can check for wrong status
> 
>
> Key: FLINK-35342
> URL: https://issues.apache.org/jira/browse/FLINK-35342
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Assignee: Feng Jin
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> * 1.20 AdaptiveScheduler / Test (module: table) 
> https://github.com/apache/flink/actions/runs/9056197319/job/24879135605#step:10:12490
>  
> It looks like 
> {{MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume}}
>  can be flaky, where the expected status is not yet RUNNING:
> {code}
> Error: 03:24:03 03:24:03.902 [ERROR] Tests run: 6, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 26.78 s <<< FAILURE! -- in 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase
> Error: 03:24:03 03:24:03.902 [ERROR] 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(Path,
>  RestClusterClient) -- Time elapsed: 3.850 s <<< FAILURE!
> May 13 03:24:03 org.opentest4j.AssertionFailedError: 
> May 13 03:24:03 
> May 13 03:24:03 expected: "RUNNING"
> May 13 03:24:03  but was: "CREATED"
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> May 13 03:24:03   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> May 13 03:24:03   at 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(MaterializedTableStatementITCase.java:650)
> May 13 03:24:03   at java.lang.reflect.Method.invoke(Method.java:498)
> May 13 03:24:03   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> May 13 03:24:03 
> May 13 03:24:04 03:24:04.270 [INFO] 
> May 13 03:24:04 03:24:04.270 [INFO] Results:
> May 13 03:24:04 03:24:04.270 [INFO] 
> Error: 03:24:04 03:24:04.270 [ERROR] Failures: 
> Error: 03:24:04 03:24:04.271 [ERROR]   
> MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume:650
>  
> May 13 03:24:04 expected: "RUNNING"
> May 13 03:24:04  but was: "CREATED"
> May 13 03:24:04 03:24:04.271 [INFO] 
> Error: 03:24:04 03:24:04.271 [ERROR] Tests run: 82, Failures: 1, Errors: 0, 
> Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35095) ExecutionEnvironmentImplTest.testFromSource failure on GitHub CI

2024-05-22 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848602#comment-17848602
 ] 

Ryan Skraba commented on FLINK-35095:
-

* 1.20 Java 17 / Test (module: misc) 
https://github.com/apache/flink/actions/runs/9184288079/job/25256627599#step:10:22297

> ExecutionEnvironmentImplTest.testFromSource failure on GitHub CI
> 
>
> Key: FLINK-35095
> URL: https://issues.apache.org/jira/browse/FLINK-35095
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> 1.20 Java 17: Test (module: misc) 
> https://github.com/apache/flink/actions/runs/8655935935/job/23735920630#step:10:3
> {code}
> Error: 02:29:05 02:29:05.708 [ERROR] Tests run: 5, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 0.360 s <<< FAILURE! -- in 
> org.apache.flink.datastream.impl.ExecutionEnvironmentImplTest
> Error: 02:29:05 02:29:05.708 [ERROR] 
> org.apache.flink.datastream.impl.ExecutionEnvironmentImplTest.testFromSource 
> -- Time elapsed: 0.131 s <<< FAILURE!
> Apr 12 02:29:05 java.lang.AssertionError: 
> Apr 12 02:29:05 
> Apr 12 02:29:05 Expecting actual:
> Apr 12 02:29:05   [47]
> Apr 12 02:29:05 to contain exactly (and in same order):
> Apr 12 02:29:05   [48]
> Apr 12 02:29:05 but some elements were not found:
> Apr 12 02:29:05   [48]
> Apr 12 02:29:05 and others were not expected:
> Apr 12 02:29:05   [47]
> Apr 12 02:29:05 
> Apr 12 02:29:05   at 
> org.apache.flink.datastream.impl.ExecutionEnvironmentImplTest.testFromSource(ExecutionEnvironmentImplTest.java:97)
> Apr 12 02:29:05   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:568)
> Apr 12 02:29:05   at 
> java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:194)
> Apr 12 02:29:05   at 
> java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
> Apr 12 02:29:05   at 
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
> Apr 12 02:29:05   at 
> java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
> Apr 12 02:29:05   at 
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
> Apr 12 02:29:05   at 
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
> Apr 12 02:29:05 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35380) ResumeCheckpointManuallyITCase hanging on tests

2024-05-22 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848604#comment-17848604
 ] 

Ryan Skraba commented on FLINK-35380:
-

* 1.20 Java 21 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9184288079/job/25256625597#step:10:9284

> ResumeCheckpointManuallyITCase hanging on tests 
> 
>
> Key: FLINK-35380
> URL: https://issues.apache.org/jira/browse/FLINK-35380
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> * 1.20 Default (Java 8) / Test (module: tests) 
> https://github.com/apache/flink/actions/runs/9105407291/job/25031170942#step:10:11841
>  
> (This is a slightly different error, waiting in a different place than 
> FLINK-28319)
> {code}
> May 16 03:23:58 
> ==
> May 16 03:23:58 Process produced no output for 900 seconds.
> May 16 03:23:58 
> ==
> ... snip until stack trace ...
> ay 16 03:23:58at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> May 16 03:23:58   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> May 16 03:23:58   at 
> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:410)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:378)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:318)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFullRocksDBCheckpointsWithLocalRecoveryStandalone(ResumeCheckpointManuallyITCase.java:133)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34513) GroupAggregateRestoreTest.testRestore fails

2024-05-22 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848601#comment-17848601
 ] 

Ryan Skraba commented on FLINK-34513:
-

* 1.19 Java 21 / Test (module: table) 
https://github.com/apache/flink/actions/runs/9184288072/job/25256618916#step:10:10748

> GroupAggregateRestoreTest.testRestore fails
> ---
>
> Key: FLINK-34513
> URL: https://issues.apache.org/jira/browse/FLINK-34513
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.20.0
>Reporter: Matthias Pohl
>Assignee: Bonnie Varghese
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57828=logs=26b84117-e436-5720-913e-3e280ce55cae=77cc7e77-39a0-5007-6d65-4137ac13a471=10881
> {code}
> Feb 24 01:12:01 01:12:01.384 [ERROR] Tests run: 10, Failures: 1, Errors: 0, 
> Skipped: 1, Time elapsed: 2.957 s <<< FAILURE! -- in 
> org.apache.flink.table.planner.plan.nodes.exec.stream.GroupAggregateRestoreTest
> Feb 24 01:12:01 01:12:01.384 [ERROR] 
> org.apache.flink.table.planner.plan.nodes.exec.stream.GroupAggregateRestoreTest.testRestore(TableTestProgram,
>  ExecNodeMetadata)[4] -- Time elapsed: 0.653 s <<< FAILURE!
> Feb 24 01:12:01 java.lang.AssertionError: 
> Feb 24 01:12:01 
> Feb 24 01:12:01 Expecting actual:
> Feb 24 01:12:01   ["+I[3, 1, 2, 8, 31, 10.0, 3]",
> Feb 24 01:12:01 "+I[2, 1, 4, 14, 42, 7.0, 6]",
> Feb 24 01:12:01 "+I[1, 1, 4, 12, 24, 6.0, 4]",
> Feb 24 01:12:01 "+U[2, 1, 4, 14, 57, 8.0, 7]",
> Feb 24 01:12:01 "+U[1, 1, 4, 12, 32, 6.0, 5]",
> Feb 24 01:12:01 "+I[7, 0, 1, 7, 7, 7.0, 1]",
> Feb 24 01:12:01 "+U[2, 1, 4, 14, 57, 7.0, 7]",
> Feb 24 01:12:01 "+U[1, 1, 4, 12, 32, 5.0, 5]",
> Feb 24 01:12:01 "+U[3, 1, 2, 8, 31, 9.0, 3]",
> Feb 24 01:12:01 "+U[7, 0, 1, 7, 7, 7.0, 2]"]
> Feb 24 01:12:01 to contain exactly in any order:
> Feb 24 01:12:01   ["+I[3, 1, 2, 8, 31, 10.0, 3]",
> Feb 24 01:12:01 "+I[2, 1, 4, 14, 42, 7.0, 6]",
> Feb 24 01:12:01 "+I[1, 1, 4, 12, 24, 6.0, 4]",
> Feb 24 01:12:01 "+U[2, 1, 4, 14, 57, 8.0, 7]",
> Feb 24 01:12:01 "+U[1, 1, 4, 12, 32, 6.0, 5]",
> Feb 24 01:12:01 "+U[3, 1, 2, 8, 31, 9.0, 3]",
> Feb 24 01:12:01 "+U[2, 1, 4, 14, 57, 7.0, 7]",
> Feb 24 01:12:01 "+I[7, 0, 1, 7, 7, 7.0, 2]",
> Feb 24 01:12:01 "+U[1, 1, 4, 12, 32, 5.0, 5]"]
> Feb 24 01:12:01 elements not found:
> Feb 24 01:12:01   ["+I[7, 0, 1, 7, 7, 7.0, 2]"]
> Feb 24 01:12:01 and elements not expected:
> Feb 24 01:12:01   ["+I[7, 0, 1, 7, 7, 7.0, 1]", "+U[7, 0, 1, 7, 7, 7.0, 2]"]
> Feb 24 01:12:01 
> Feb 24 01:12:01   at 
> org.apache.flink.table.planner.plan.nodes.exec.testutils.RestoreTestBase.testRestore(RestoreTestBase.java:313)
> Feb 24 01:12:01   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:580)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35418) EventTimeWindowCheckpointingITCase fails with an NPE

2024-05-22 Thread Ryan Skraba (Jira)
Ryan Skraba created FLINK-35418:
---

 Summary: EventTimeWindowCheckpointingITCase fails with an NPE
 Key: FLINK-35418
 URL: https://issues.apache.org/jira/browse/FLINK-35418
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.20.0
Reporter: Ryan Skraba


* 1.20 Default (Java 8) / Test (module: tests) 
[https://github.com/apache/flink/actions/runs/9185169193/job/25258948607#step:10:8106]

It looks like it's possible for PhysicalFile to generate a NullPointerException 
while a checkpoint is being aborted:

{code}
May 22 04:35:18 Starting 
org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase#testTumblingTimeWindow[statebackend
 type =ROCKSDB_INCREMENTAL_ZK, buffersPerChannel = 2].
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at 
org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
at 
org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at 
org.apache.flink.runtime.rpc.pekko.PekkoInvocationHandler.lambda$invokeRpc$1(PekkoInvocationHandler.java:268)
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at 
org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1287)
at 
org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93)
at 
org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
at 
org.apache.flink.runtime.concurrent.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at 
org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$1.onComplete(ScalaFutureUtils.java:47)
at org.apache.pekko.dispatch.OnComplete.internal(Future.scala:310)
at org.apache.pekko.dispatch.OnComplete.internal(Future.scala:307)
at org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:234)
at org.apache.pekko.dispatch.japi$CallbackBridge.apply(Future.scala:231)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at 
org.apache.flink.runtime.concurrent.pekko.ScalaFutureUtils$DirectExecutionContext.execute(ScalaFutureUtils.java:65)
at 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
at 
scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
at 
scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
at 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288)
at org.apache.pekko.pattern.PromiseActorRef.$bang(AskSupport.scala:629)
at 
org.apache.pekko.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:34)
at 
org.apache.pekko.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:33)
at scala.concurrent.Future.$anonfun$andThen$1(Future.scala:536)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at 
org.apache.pekko.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:73)
at 
org.apache.pekko.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:110)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at 
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:85)
at 
org.apache.pekko.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:110)
at 

[jira] [Commented] (FLINK-34513) GroupAggregateRestoreTest.testRestore fails

2024-05-21 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848233#comment-17848233
 ] 

Ryan Skraba commented on FLINK-34513:
-

* 1.20 Java 8 / Test (module: table) 
[https://github.com/apache/flink/actions/runs/9144334458/job/25142198437#step:10:10702]

> GroupAggregateRestoreTest.testRestore fails
> ---
>
> Key: FLINK-34513
> URL: https://issues.apache.org/jira/browse/FLINK-34513
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.20.0
>Reporter: Matthias Pohl
>Assignee: Bonnie Varghese
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57828=logs=26b84117-e436-5720-913e-3e280ce55cae=77cc7e77-39a0-5007-6d65-4137ac13a471=10881
> {code}
> Feb 24 01:12:01 01:12:01.384 [ERROR] Tests run: 10, Failures: 1, Errors: 0, 
> Skipped: 1, Time elapsed: 2.957 s <<< FAILURE! -- in 
> org.apache.flink.table.planner.plan.nodes.exec.stream.GroupAggregateRestoreTest
> Feb 24 01:12:01 01:12:01.384 [ERROR] 
> org.apache.flink.table.planner.plan.nodes.exec.stream.GroupAggregateRestoreTest.testRestore(TableTestProgram,
>  ExecNodeMetadata)[4] -- Time elapsed: 0.653 s <<< FAILURE!
> Feb 24 01:12:01 java.lang.AssertionError: 
> Feb 24 01:12:01 
> Feb 24 01:12:01 Expecting actual:
> Feb 24 01:12:01   ["+I[3, 1, 2, 8, 31, 10.0, 3]",
> Feb 24 01:12:01 "+I[2, 1, 4, 14, 42, 7.0, 6]",
> Feb 24 01:12:01 "+I[1, 1, 4, 12, 24, 6.0, 4]",
> Feb 24 01:12:01 "+U[2, 1, 4, 14, 57, 8.0, 7]",
> Feb 24 01:12:01 "+U[1, 1, 4, 12, 32, 6.0, 5]",
> Feb 24 01:12:01 "+I[7, 0, 1, 7, 7, 7.0, 1]",
> Feb 24 01:12:01 "+U[2, 1, 4, 14, 57, 7.0, 7]",
> Feb 24 01:12:01 "+U[1, 1, 4, 12, 32, 5.0, 5]",
> Feb 24 01:12:01 "+U[3, 1, 2, 8, 31, 9.0, 3]",
> Feb 24 01:12:01 "+U[7, 0, 1, 7, 7, 7.0, 2]"]
> Feb 24 01:12:01 to contain exactly in any order:
> Feb 24 01:12:01   ["+I[3, 1, 2, 8, 31, 10.0, 3]",
> Feb 24 01:12:01 "+I[2, 1, 4, 14, 42, 7.0, 6]",
> Feb 24 01:12:01 "+I[1, 1, 4, 12, 24, 6.0, 4]",
> Feb 24 01:12:01 "+U[2, 1, 4, 14, 57, 8.0, 7]",
> Feb 24 01:12:01 "+U[1, 1, 4, 12, 32, 6.0, 5]",
> Feb 24 01:12:01 "+U[3, 1, 2, 8, 31, 9.0, 3]",
> Feb 24 01:12:01 "+U[2, 1, 4, 14, 57, 7.0, 7]",
> Feb 24 01:12:01 "+I[7, 0, 1, 7, 7, 7.0, 2]",
> Feb 24 01:12:01 "+U[1, 1, 4, 12, 32, 5.0, 5]"]
> Feb 24 01:12:01 elements not found:
> Feb 24 01:12:01   ["+I[7, 0, 1, 7, 7, 7.0, 2]"]
> Feb 24 01:12:01 and elements not expected:
> Feb 24 01:12:01   ["+I[7, 0, 1, 7, 7, 7.0, 1]", "+U[7, 0, 1, 7, 7, 7.0, 2]"]
> Feb 24 01:12:01 
> Feb 24 01:12:01   at 
> org.apache.flink.table.planner.plan.nodes.exec.testutils.RestoreTestBase.testRestore(RestoreTestBase.java:313)
> Feb 24 01:12:01   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:580)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35382) ChangelogCompatibilityITCase.testRestore fails with an NPE

2024-05-21 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848220#comment-17848220
 ] 

Ryan Skraba commented on FLINK-35382:
-

Thanks for the fix!  I'm just noting this build failure from 3 days ago 
(doesn't include the fix yet):

* 1.20 Java 11 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9136523142/job/25125588800#step:10:8741

> ChangelogCompatibilityITCase.testRestore fails with an NPE
> --
>
> Key: FLINK-35382
> URL: https://issues.apache.org/jira/browse/FLINK-35382
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Assignee: Jinzhong Li
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> * 1.20 Java 8 / Test (module: tests) 
> https://github.com/apache/flink/actions/runs/9110398985/job/25045798401#step:10:8192
> It looks like there can be a [NullPointerException at this 
> line|https://github.com/apache/flink/blob/9a5a99b1a30054268bbde36d565cbb1b81018890/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/filemerging/FileMergingSnapshotManagerBase.java#L666]
>  causing a test failure:
> {code}
> Error: 10:36:23 10:36:23.312 [ERROR] Tests run: 9, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 19.31 s <<< FAILURE! -- in 
> org.apache.flink.test.state.ChangelogCompatibilityITCase
> Error: 10:36:23 10:36:23.313 [ERROR] 
> org.apache.flink.test.state.ChangelogCompatibilityITCase.testRestore[startWithChangelog=false,
>  restoreWithChangelog=true, restoreFrom=CHECKPOINT, allowStore=true, 
> allowRestore=true] -- Time elapsed: 1.492 s <<< ERROR!
> May 16 10:36:23 java.lang.RuntimeException: 
> org.opentest4j.AssertionFailedError: Graph is in globally terminal state 
> (FAILED)
> May 16 10:36:23   at 
> org.apache.flink.test.state.ChangelogCompatibilityITCase.tryRun(ChangelogCompatibilityITCase.java:204)
> May 16 10:36:23   at 
> org.apache.flink.test.state.ChangelogCompatibilityITCase.restoreAndValidate(ChangelogCompatibilityITCase.java:190)
> May 16 10:36:23   at java.util.Optional.ifPresent(Optional.java:159)
> May 16 10:36:23   at 
> org.apache.flink.test.state.ChangelogCompatibilityITCase.testRestore(ChangelogCompatibilityITCase.java:118)
> May 16 10:36:23   at java.lang.reflect.Method.invoke(Method.java:498)
> May 16 10:36:23 Caused by: org.opentest4j.AssertionFailedError: Graph is in 
> globally terminal state (FAILED)
> May 16 10:36:23   at 
> org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:42)
> May 16 10:36:23   at 
> org.junit.jupiter.api.Assertions.fail(Assertions.java:150)
> May 16 10:36:23   at 
> org.apache.flink.runtime.testutils.CommonTestUtils.lambda$waitForAllTaskRunning$3(CommonTestUtils.java:214)
> May 16 10:36:23   at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:151)
> May 16 10:36:23   at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145)
> May 16 10:36:23   at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitForAllTaskRunning(CommonTestUtils.java:209)
> May 16 10:36:23   at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitForAllTaskRunning(CommonTestUtils.java:182)
> May 16 10:36:23   at 
> org.apache.flink.test.state.ChangelogCompatibilityITCase.submit(ChangelogCompatibilityITCase.java:284)
> May 16 10:36:23   at 
> org.apache.flink.test.state.ChangelogCompatibilityITCase.tryRun(ChangelogCompatibilityITCase.java:197)
> May 16 10:36:23   ... 4 more
> May 16 10:36:23 Caused by: org.apache.flink.runtime.JobException: 
> org.apache.flink.runtime.JobException: Recovery is suppressed by 
> NoRestartBackoffTimeStrategy
> May 16 10:36:23   at 
> org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:219)
> May 16 10:36:23   at 
> org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailureAndReport(ExecutionFailureHandler.java:166)
> May 16 10:36:23   at 
> org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:121)
> May 16 10:36:23   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.recordTaskFailure(DefaultScheduler.java:279)
> May 16 10:36:23   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:270)
> May 16 10:36:23   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.onTaskFailed(DefaultScheduler.java:263)
> May 16 10:36:23   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.onTaskExecutionStateUpdate(SchedulerBase.java:788)
> May 16 10:36:23   at 
> 

[jira] [Commented] (FLINK-35342) MaterializedTableStatementITCase test can check for wrong status

2024-05-21 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848217#comment-17848217
 ] 

Ryan Skraba commented on FLINK-35342:
-

* 1.20 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/9167756989/job/25205665075#step:10:12475
* 1.20 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/9152485864/job/25160229892#step:10:12475
* 1.20 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/9144334458/job/25142199658#step:10:12490
* 1.20 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/9136523142/job/25125573106#step:10:12493

> MaterializedTableStatementITCase test can check for wrong status
> 
>
> Key: FLINK-35342
> URL: https://issues.apache.org/jira/browse/FLINK-35342
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Assignee: Feng Jin
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> * 1.20 AdaptiveScheduler / Test (module: table) 
> https://github.com/apache/flink/actions/runs/9056197319/job/24879135605#step:10:12490
>  
> It looks like 
> {{MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume}}
>  can be flaky, where the expected status is not yet RUNNING:
> {code}
> Error: 03:24:03 03:24:03.902 [ERROR] Tests run: 6, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 26.78 s <<< FAILURE! -- in 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase
> Error: 03:24:03 03:24:03.902 [ERROR] 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(Path,
>  RestClusterClient) -- Time elapsed: 3.850 s <<< FAILURE!
> May 13 03:24:03 org.opentest4j.AssertionFailedError: 
> May 13 03:24:03 
> May 13 03:24:03 expected: "RUNNING"
> May 13 03:24:03  but was: "CREATED"
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> May 13 03:24:03   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> May 13 03:24:03   at 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(MaterializedTableStatementITCase.java:650)
> May 13 03:24:03   at java.lang.reflect.Method.invoke(Method.java:498)
> May 13 03:24:03   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> May 13 03:24:03 
> May 13 03:24:04 03:24:04.270 [INFO] 
> May 13 03:24:04 03:24:04.270 [INFO] Results:
> May 13 03:24:04 03:24:04.270 [INFO] 
> Error: 03:24:04 03:24:04.270 [ERROR] Failures: 
> Error: 03:24:04 03:24:04.271 [ERROR]   
> MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume:650
>  
> May 13 03:24:04 expected: "RUNNING"
> May 13 03:24:04  but was: "CREATED"
> May 13 03:24:04 03:24:04.271 [INFO] 
> Error: 03:24:04 03:24:04.271 [ERROR] Tests run: 82, Failures: 1, Errors: 0, 
> Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35380) ResumeCheckpointManuallyITCase hanging on tests

2024-05-21 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848218#comment-17848218
 ] 

Ryan Skraba commented on FLINK-35380:
-

* 1.20 Java 11 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9167756989/job/25205690872#step:10:11589
* 1.20 Java 21 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9152485864/job/25160249723#step:10:9091
* 1.20 Java 17 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9144334458/job/25142210330#step:10:9094
* 1.20 AdaptiveScheduler / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9144334458/job/25142199866#step:10:11729
* 1.20 AdaptiveScheduler / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9136523142/job/25125573321#step:10:11731

> ResumeCheckpointManuallyITCase hanging on tests 
> 
>
> Key: FLINK-35380
> URL: https://issues.apache.org/jira/browse/FLINK-35380
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> * 1.20 Default (Java 8) / Test (module: tests) 
> https://github.com/apache/flink/actions/runs/9105407291/job/25031170942#step:10:11841
>  
> (This is a slightly different error, waiting in a different place than 
> FLINK-28319)
> {code}
> May 16 03:23:58 
> ==
> May 16 03:23:58 Process produced no output for 900 seconds.
> May 16 03:23:58 
> ==
> ... snip until stack trace ...
> ay 16 03:23:58at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> May 16 03:23:58   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> May 16 03:23:58   at 
> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:410)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:378)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:318)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFullRocksDBCheckpointsWithLocalRecoveryStandalone(ResumeCheckpointManuallyITCase.java:133)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35002) GitHub action request timeout to ArtifactService

2024-05-21 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848216#comment-17848216
 ] 

Ryan Skraba commented on FLINK-35002:
-

* 1.18 Java 8 / Compile 
https://github.com/apache/flink/commit/08ba6497e3ec7106021043612118df77330f9797/checks/25205410372/logs
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/commit/e5398e1025ec4312bac74a8b32b98d03cb254667/checks/25204700233/logs

> GitHub action request timeout  to ArtifactService
> -
>
> Key: FLINK-35002
> URL: https://issues.apache.org/jira/browse/FLINK-35002
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Reporter: Ryan Skraba
>Priority: Major
>  Labels: github-actions, test-stability
>
> A timeout can occur when uploading a successfully built artifact:
>  * [https://github.com/apache/flink/actions/runs/8516411871/job/23325392650]
> {code:java}
> 2024-04-02T02:20:15.6355368Z With the provided path, there will be 1 file 
> uploaded
> 2024-04-02T02:20:15.6360133Z Artifact name is valid!
> 2024-04-02T02:20:15.6362872Z Root directory input is valid!
> 2024-04-02T02:20:20.6975036Z Attempt 1 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 3000 ms...
> 2024-04-02T02:20:28.7084937Z Attempt 2 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 4785 ms...
> 2024-04-02T02:20:38.5015936Z Attempt 3 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 7375 ms...
> 2024-04-02T02:20:50.8901508Z Attempt 4 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 14988 ms...
> 2024-04-02T02:21:10.9028438Z ##[error]Failed to CreateArtifact: Failed to 
> make request after 5 attempts: Request timeout: 
> /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact
> 2024-04-02T02:22:59.9893296Z Post job cleanup.
> 2024-04-02T02:22:59.9958844Z Post job cleanup. {code}
> (This is unlikely to be something we can fix, but we can track it.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35413) VertexFinishedStateCheckerTest causes exit 239

2024-05-21 Thread Ryan Skraba (Jira)
Ryan Skraba created FLINK-35413:
---

 Summary: VertexFinishedStateCheckerTest causes exit 239
 Key: FLINK-35413
 URL: https://issues.apache.org/jira/browse/FLINK-35413
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.20.0
Reporter: Ryan Skraba


1.20 test_cron_azure core 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59676=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef=9429

{code}
May 21 01:31:42 01:31:42.160 [ERROR] 
org.apache.flink.runtime.checkpoint.VertexFinishedStateCheckerTest
May 21 01:31:42 01:31:42.160 [ERROR] 
org.apache.maven.surefire.booter.SurefireBooterForkException: 
ExecutionException The forked VM terminated without properly saying goodbye. VM 
crash or System.exit called?
May 21 01:31:42 01:31:42.160 [ERROR] Command was /bin/sh -c cd 
'/__w/1/s/flink-runtime' && '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' 
'-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' 
'--add-opens=java.base/java.util=ALL-UNNAMED' 
'--add-opens=java.base/java.lang=ALL-UNNAMED' 
'--add-opens=java.base/java.net=ALL-UNNAMED' 
'--add-opens=java.base/java.io=ALL-UNNAMED' 
'--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '-Xmx768m' '-jar' 
'/__w/1/s/flink-runtime/target/surefire/surefirebooter-20240521011847857_99.jar'
 '/__w/1/s/flink-runtime/target/surefire' '2024-05-21T01-15-09_325-jvmRun1' 
'surefire-20240521011847857_97tmp' 'surefire_29-20240521011847857_98tmp'
May 21 01:31:42 01:31:42.160 [ERROR] Error occurred in starting fork, check 
output in log
May 21 01:31:42 01:31:42.160 [ERROR] Process Exit Code: 239
May 21 01:31:42 01:31:42.160 [ERROR] Crashed tests:
May 21 01:31:42 01:31:42.160 [ERROR] 
org.apache.flink.runtime.checkpoint.VertexFinishedStateCheckerTest
May 21 01:31:42 01:31:42.160 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456)
May 21 01:31:42 01:31:42.160 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkOnceMultiple(ForkStarter.java:358)
May 21 01:31:42 01:31:42.160 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:296)
May 21 01:31:42 01:31:42.160 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:250)
May 21 01:31:42 01:31:42.160 [ERROR]at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1240)
May 21 01:31:42 01:31:42.160 [ERROR]at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1089)
May 21 01:31:42 01:31:42.160 [ERROR]at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:905)
May 21 01:31:42 01:31:42.160 [ERROR]at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
{code}

In the build artifact {{mvn-1.log}} the following FATAL error is found:

{code}
01:19:08,584 [ pool-9-thread-1] ERROR 
org.apache.flink.util.FatalExitExceptionHandler  [] - FATAL: Thread 
'pool-9-thread-1' produced an uncaught exception. Stopping the process...
java.util.concurrent.CompletionException: 
java.util.concurrent.RejectedExecutionException: Task 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@5ead9062 
rejected from 
java.util.concurrent.ScheduledThreadPoolExecutor@4d0e55ac[Shutting down, pool 
size = 1, active threads = 1, queued tasks = 1, completed tasks = 194]
at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
 ~[?:1.8.0_292]
at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
 ~[?:1.8.0_292]
at 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:838) 
~[?:1.8.0_292]
at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
 ~[?:1.8.0_292]
at 
java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:851)
 ~[?:1.8.0_292]
at 
java.util.concurrent.CompletableFuture.handleAsync(CompletableFuture.java:2178) 
~[?:1.8.0_292]
at 
org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer.allocateSlot(DefaultSlotStatusSyncer.java:138)
 ~[classes/:?]
at 
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.allocateSlotsAccordingTo(FineGrainedSlotManager.java:722)
 ~[classes/:?]
at 
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.checkResourceRequirements(FineGrainedSlotManager.java:645)
 ~[classes/:?]
at 
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.lambda$null$12(FineGrainedSlotManager.java:603)
 ~[classes/:?]
at 

[jira] [Commented] (FLINK-34673) SessionRelatedITCase#testTouchSession failure on GitHub Actions

2024-05-21 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848115#comment-17848115
 ] 

Ryan Skraba commented on FLINK-34673:
-

Hello!  This is an easy one to reproduce, by launching it in IntelliJ (for 
example) and setting the run configuration to Repeat until failure.  It 
normally only takes about 100 or so runs to demonstrate the flaky test.

In this case, the motivation is to reduce the (vast) number of flaky tests so 
that we can trust that CI results are reliably real failures.  The test is 
faulty, not any code that would find itself in production!

> SessionRelatedITCase#testTouchSession failure on GitHub Actions
> ---
>
> Key: FLINK-34673
> URL: https://issues.apache.org/jira/browse/FLINK-34673
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Gateway
>Affects Versions: 1.19.0
>Reporter: Ryan Skraba
>Priority: Major
>  Labels: starter, test-stability
>
> [https://github.com/apache/flink/actions/runs/8258416388/job/22590907051#step:10:12155]
> {code:java}
>  Error: 03:08:21 03:08:21.304 [ERROR] 
> org.apache.flink.table.gateway.rest.SessionRelatedITCase.testTouchSession -- 
> Time elapsed: 0.015 s <<< FAILURE!
> Mar 13 03:08:21 java.lang.AssertionError: 
> Mar 13 03:08:21 
> Mar 13 03:08:21 Expecting actual:
> Mar 13 03:08:21   1710299301198L
> Mar 13 03:08:21 to be greater than:
> Mar 13 03:08:21   1710299301198L
> Mar 13 03:08:21 
> Mar 13 03:08:21     at 
> org.apache.flink.table.gateway.rest.SessionRelatedITCase.testTouchSession(SessionRelatedITCase.java:175)
> Mar 13 03:08:21     at 
> java.base/java.lang.reflect.Method.invoke(Method.java:580)
> Mar 13 03:08:21     at 
> java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:194)
> Mar 13 03:08:21     at 
> java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
> Mar 13 03:08:21     at 
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1312)
> Mar 13 03:08:21     at 
> java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1843)
> Mar 13 03:08:21     at 
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1808)
> Mar 13 03:08:21     at 
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35382) ChangelogCompatibilityITCase.testRestore fails with an NPE

2024-05-17 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847262#comment-17847262
 ] 

Ryan Skraba commented on FLINK-35382:
-

* 1.20 Default (Java 8) / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9123384110/job/25085945877#step:10:8789

> ChangelogCompatibilityITCase.testRestore fails with an NPE
> --
>
> Key: FLINK-35382
> URL: https://issues.apache.org/jira/browse/FLINK-35382
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> * 1.20 Java 8 / Test (module: tests) 
> https://github.com/apache/flink/actions/runs/9110398985/job/25045798401#step:10:8192
> It looks like there can be a [NullPointerException at this 
> line|https://github.com/apache/flink/blob/9a5a99b1a30054268bbde36d565cbb1b81018890/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/filemerging/FileMergingSnapshotManagerBase.java#L666]
>  causing a test failure:
> {code}
> Error: 10:36:23 10:36:23.312 [ERROR] Tests run: 9, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 19.31 s <<< FAILURE! -- in 
> org.apache.flink.test.state.ChangelogCompatibilityITCase
> Error: 10:36:23 10:36:23.313 [ERROR] 
> org.apache.flink.test.state.ChangelogCompatibilityITCase.testRestore[startWithChangelog=false,
>  restoreWithChangelog=true, restoreFrom=CHECKPOINT, allowStore=true, 
> allowRestore=true] -- Time elapsed: 1.492 s <<< ERROR!
> May 16 10:36:23 java.lang.RuntimeException: 
> org.opentest4j.AssertionFailedError: Graph is in globally terminal state 
> (FAILED)
> May 16 10:36:23   at 
> org.apache.flink.test.state.ChangelogCompatibilityITCase.tryRun(ChangelogCompatibilityITCase.java:204)
> May 16 10:36:23   at 
> org.apache.flink.test.state.ChangelogCompatibilityITCase.restoreAndValidate(ChangelogCompatibilityITCase.java:190)
> May 16 10:36:23   at java.util.Optional.ifPresent(Optional.java:159)
> May 16 10:36:23   at 
> org.apache.flink.test.state.ChangelogCompatibilityITCase.testRestore(ChangelogCompatibilityITCase.java:118)
> May 16 10:36:23   at java.lang.reflect.Method.invoke(Method.java:498)
> May 16 10:36:23 Caused by: org.opentest4j.AssertionFailedError: Graph is in 
> globally terminal state (FAILED)
> May 16 10:36:23   at 
> org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:42)
> May 16 10:36:23   at 
> org.junit.jupiter.api.Assertions.fail(Assertions.java:150)
> May 16 10:36:23   at 
> org.apache.flink.runtime.testutils.CommonTestUtils.lambda$waitForAllTaskRunning$3(CommonTestUtils.java:214)
> May 16 10:36:23   at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:151)
> May 16 10:36:23   at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145)
> May 16 10:36:23   at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitForAllTaskRunning(CommonTestUtils.java:209)
> May 16 10:36:23   at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitForAllTaskRunning(CommonTestUtils.java:182)
> May 16 10:36:23   at 
> org.apache.flink.test.state.ChangelogCompatibilityITCase.submit(ChangelogCompatibilityITCase.java:284)
> May 16 10:36:23   at 
> org.apache.flink.test.state.ChangelogCompatibilityITCase.tryRun(ChangelogCompatibilityITCase.java:197)
> May 16 10:36:23   ... 4 more
> May 16 10:36:23 Caused by: org.apache.flink.runtime.JobException: 
> org.apache.flink.runtime.JobException: Recovery is suppressed by 
> NoRestartBackoffTimeStrategy
> May 16 10:36:23   at 
> org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:219)
> May 16 10:36:23   at 
> org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailureAndReport(ExecutionFailureHandler.java:166)
> May 16 10:36:23   at 
> org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:121)
> May 16 10:36:23   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.recordTaskFailure(DefaultScheduler.java:279)
> May 16 10:36:23   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:270)
> May 16 10:36:23   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.onTaskFailed(DefaultScheduler.java:263)
> May 16 10:36:23   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.onTaskExecutionStateUpdate(SchedulerBase.java:788)
> May 16 10:36:23   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:765)
> May 16 10:36:23   at 
> 

[jira] [Commented] (FLINK-35380) ResumeCheckpointManuallyITCase hanging on tests

2024-05-17 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847261#comment-17847261
 ] 

Ryan Skraba commented on FLINK-35380:
-

* 1.20 Default (Java 8) / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9121965925/job/25082216099#step:10:9953
* 1.20 test_cron_jdk21 tests 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59617=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=12051
* 1.20 test_cron_adaptive_scheduler tests 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59617=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=13040

> ResumeCheckpointManuallyITCase hanging on tests 
> 
>
> Key: FLINK-35380
> URL: https://issues.apache.org/jira/browse/FLINK-35380
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> * 1.20 Default (Java 8) / Test (module: tests) 
> https://github.com/apache/flink/actions/runs/9105407291/job/25031170942#step:10:11841
>  
> (This is a slightly different error, waiting in a different place than 
> FLINK-28319)
> {code}
> May 16 03:23:58 
> ==
> May 16 03:23:58 Process produced no output for 900 seconds.
> May 16 03:23:58 
> ==
> ... snip until stack trace ...
> ay 16 03:23:58at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> May 16 03:23:58   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> May 16 03:23:58   at 
> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:410)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:378)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:318)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFullRocksDBCheckpointsWithLocalRecoveryStandalone(ResumeCheckpointManuallyITCase.java:133)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35342) MaterializedTableStatementITCase test can check for wrong status

2024-05-17 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847260#comment-17847260
 ] 

Ryan Skraba commented on FLINK-35342:
-

I didn't succeed in reproducing the error locally... unfortunately, sometimes 
CI is the only way for me too!  If anyone has some expertise in this area or a 
pointer, I would love to learn!

In the meantime: 
* 1.20 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/9121895520/job/25082050482#step:10:12475

> MaterializedTableStatementITCase test can check for wrong status
> 
>
> Key: FLINK-35342
> URL: https://issues.apache.org/jira/browse/FLINK-35342
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> * 1.20 AdaptiveScheduler / Test (module: table) 
> https://github.com/apache/flink/actions/runs/9056197319/job/24879135605#step:10:12490
>  
> It looks like 
> {{MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume}}
>  can be flaky, where the expected status is not yet RUNNING:
> {code}
> Error: 03:24:03 03:24:03.902 [ERROR] Tests run: 6, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 26.78 s <<< FAILURE! -- in 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase
> Error: 03:24:03 03:24:03.902 [ERROR] 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(Path,
>  RestClusterClient) -- Time elapsed: 3.850 s <<< FAILURE!
> May 13 03:24:03 org.opentest4j.AssertionFailedError: 
> May 13 03:24:03 
> May 13 03:24:03 expected: "RUNNING"
> May 13 03:24:03  but was: "CREATED"
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> May 13 03:24:03   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> May 13 03:24:03   at 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(MaterializedTableStatementITCase.java:650)
> May 13 03:24:03   at java.lang.reflect.Method.invoke(Method.java:498)
> May 13 03:24:03   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> May 13 03:24:03 
> May 13 03:24:04 03:24:04.270 [INFO] 
> May 13 03:24:04 03:24:04.270 [INFO] Results:
> May 13 03:24:04 03:24:04.270 [INFO] 
> Error: 03:24:04 03:24:04.270 [ERROR] Failures: 
> Error: 03:24:04 03:24:04.271 [ERROR]   
> MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume:650
>  
> May 13 03:24:04 expected: "RUNNING"
> May 13 03:24:04  but was: "CREATED"
> May 13 03:24:04 03:24:04.271 [INFO] 
> Error: 03:24:04 03:24:04.271 [ERROR] Tests run: 82, Failures: 1, Errors: 0, 
> Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34273) git fetch fails

2024-05-17 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847259#comment-17847259
 ] 

Ryan Skraba commented on FLINK-34273:
-

* 1.20 test_cron_adaptive_scheduler table 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59617=logs=f2c100be-250b-5e85-7bbe-176f68fcddc5=6d51823d-b341-5f58-cf42-40e574735727=980

> git fetch fails
> ---
>
> Key: FLINK-34273
> URL: https://issues.apache.org/jira/browse/FLINK-34273
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI, Test Infrastructure
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: test-stability
>
> We've seen multiple {{git fetch}} failures. I assume this to be an 
> infrastructure issue. This Jira issue is for documentation purposes.
> {code:java}
> error: RPC failed; curl 18 transfer closed with outstanding read data 
> remaining
> error: 5211 bytes of body are still expected
> fetch-pack: unexpected disconnect while reading sideband packet
> fatal: early EOF
> fatal: fetch-pack: invalid index-pack output {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57080=logs=0e7be18f-84f2-53f0-a32d-4a5e4a174679=5d6dc3d3-393d-5111-3a40-c6a5a36202e6=667



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-25168) Azure failed due to unable to transfer maven artifacts

2024-05-16 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847027#comment-17847027
 ] 

Ryan Skraba commented on FLINK-25168:
-

* 1.19 cron_snapshot_deployment_maven 
[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59585=logs=eca6b3a6-1600-56cc-916a-c549b3cde3ff=7b3c1df5-9194-5183-5ebd-5567f52d5f8f=6027]

Another failure... it might be that repository.apache.org is unstable at the 
moment?  In the log, we can also see many retries (eventually succeeding) that 
look like this:

{code}
03:23:16.738 [WARNING] Failed to upload checksum to 
org/apache/flink/flink-dstl-dfs/1.19-SNAPSHOT/flink-dstl-dfs-1.19-20240516.031110-210-javadoc.jar.sha1
org.apache.maven.wagon.TransferFailedException: transfer failed for 
https://repository.apache.org/content/repositories/snapshots/org/apache/flink/flink-dstl-dfs/1.19-SNAPSHOT/flink-dstl-dfs-1.19-20240516.031110-210-javadoc.jar.sha1,
 status: 408 Request Timeout
at 
org.apache.maven.wagon.providers.http.wagon.shared.AbstractHttpClientWagon.put 
(AbstractHttpClientWagon.java:835)
at 
org.apache.maven.wagon.providers.http.wagon.shared.AbstractHttpClientWagon.put 
(AbstractHttpClientWagon.java:750)
at 
org.apache.maven.wagon.providers.http.wagon.shared.AbstractHttpClientWagon.put 
(AbstractHttpClientWagon.java:722)
at 
org.apache.maven.wagon.providers.http.wagon.shared.AbstractHttpClientWagon.put 
(AbstractHttpClientWagon.java:716)
at 
org.apache.maven.wagon.providers.http.wagon.shared.AbstractHttpClientWagon.putFromStream
 (AbstractHttpClientWagon.java:710)
at org.eclipse.aether.transport.wagon.WagonTransporter$PutTaskRunner.run 
(WagonTransporter.java:605)
{code}


> Azure failed due to unable to transfer maven artifacts
> --
>
> Key: FLINK-25168
> URL: https://issues.apache.org/jira/browse/FLINK-25168
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines
>Affects Versions: 1.13.3, 1.15.0
>Reporter: Yun Gao
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: test-stability
>
> {code:java}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-deploy-plugin:2.8.2:deploy (default-deploy) on 
> project flink-tests: Failed to deploy artifacts: Could not transfer artifact 
> org.apache.flink:flink-tests:jar:1.13-20211205.020632-728 from/to 
> apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots): Failed to 
> transfer file: 
> https://repository.apache.org/content/repositories/snapshots/org/apache/flink/flink-tests/1.13-SNAPSHOT/flink-tests-1.13-20211205.020632-728.jar.
>  Return code is: 502, ReasonPhrase: Proxy Error. -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> [ERROR] 
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :flink-tests
> ##[error]Bash exited with code '1'.
> Finishing: Deploy maven snapshot
>  {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=27560=logs=eca6b3a6-1600-56cc-916a-c549b3cde3ff=e9844b5e-5aa3-546b-6c3e-5395c7c0cac7=97156



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-26644) python StreamExecutionEnvironmentTests.test_generate_stream_graph_with_dependencies failed on azure

2024-05-16 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847026#comment-17847026
 ] 

Ryan Skraba commented on FLINK-26644:
-

* 1.19 Java 8 / Test (module: python) 
https://github.com/apache/flink/actions/runs/910540/job/25031151177#step:10:24395

> python 
> StreamExecutionEnvironmentTests.test_generate_stream_graph_with_dependencies 
> failed on azure
> ---
>
> Key: FLINK-26644
> URL: https://issues.apache.org/jira/browse/FLINK-26644
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.14.4, 1.15.0, 1.16.0, 1.19.0
>Reporter: Yun Gao
>Priority: Minor
>  Labels: auto-deprioritized-major, test-stability
>
> {code:java}
> 2022-03-14T18:50:24.6842853Z Mar 14 18:50:24 
> === FAILURES 
> ===
> 2022-03-14T18:50:24.6844089Z Mar 14 18:50:24 _ 
> StreamExecutionEnvironmentTests.test_generate_stream_graph_with_dependencies _
> 2022-03-14T18:50:24.6844846Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6846063Z Mar 14 18:50:24 self = 
>   testMethod=test_generate_stream_graph_with_dependencies>
> 2022-03-14T18:50:24.6847104Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6847766Z Mar 14 18:50:24 def 
> test_generate_stream_graph_with_dependencies(self):
> 2022-03-14T18:50:24.6848677Z Mar 14 18:50:24 python_file_dir = 
> os.path.join(self.tempdir, "python_file_dir_" + str(uuid.uuid4()))
> 2022-03-14T18:50:24.6849833Z Mar 14 18:50:24 os.mkdir(python_file_dir)
> 2022-03-14T18:50:24.6850729Z Mar 14 18:50:24 python_file_path = 
> os.path.join(python_file_dir, "test_stream_dependency_manage_lib.py")
> 2022-03-14T18:50:24.6852679Z Mar 14 18:50:24 with 
> open(python_file_path, 'w') as f:
> 2022-03-14T18:50:24.6853646Z Mar 14 18:50:24 f.write("def 
> add_two(a):\nreturn a + 2")
> 2022-03-14T18:50:24.6854394Z Mar 14 18:50:24 env = self.env
> 2022-03-14T18:50:24.6855019Z Mar 14 18:50:24 
> env.add_python_file(python_file_path)
> 2022-03-14T18:50:24.6855519Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6856254Z Mar 14 18:50:24 def plus_two_map(value):
> 2022-03-14T18:50:24.6857045Z Mar 14 18:50:24 from 
> test_stream_dependency_manage_lib import add_two
> 2022-03-14T18:50:24.6857865Z Mar 14 18:50:24 return value[0], 
> add_two(value[1])
> 2022-03-14T18:50:24.6858466Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6858924Z Mar 14 18:50:24 def add_from_file(i):
> 2022-03-14T18:50:24.6859806Z Mar 14 18:50:24 with 
> open("data/data.txt", 'r') as f:
> 2022-03-14T18:50:24.6860266Z Mar 14 18:50:24 return i[0], 
> i[1] + int(f.read())
> 2022-03-14T18:50:24.6860879Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6862022Z Mar 14 18:50:24 from_collection_source = 
> env.from_collection([('a', 0), ('b', 0), ('c', 1), ('d', 1),
> 2022-03-14T18:50:24.6863259Z Mar 14 18:50:24  
>  ('e', 2)],
> 2022-03-14T18:50:24.6864057Z Mar 14 18:50:24  
> type_info=Types.ROW([Types.STRING(),
> 2022-03-14T18:50:24.6864651Z Mar 14 18:50:24  
>  Types.INT()]))
> 2022-03-14T18:50:24.6865150Z Mar 14 18:50:24 
> from_collection_source.name("From Collection")
> 2022-03-14T18:50:24.6866212Z Mar 14 18:50:24 keyed_stream = 
> from_collection_source.key_by(lambda x: x[1], key_type=Types.INT())
> 2022-03-14T18:50:24.6867083Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6867793Z Mar 14 18:50:24 plus_two_map_stream = 
> keyed_stream.map(plus_two_map).name("Plus Two Map").set_parallelism(3)
> 2022-03-14T18:50:24.6868620Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6869412Z Mar 14 18:50:24 add_from_file_map = 
> plus_two_map_stream.map(add_from_file).name("Add From File Map")
> 2022-03-14T18:50:24.6870239Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6870883Z Mar 14 18:50:24 test_stream_sink = 
> add_from_file_map.add_sink(self.test_sink).name("Test Sink")
> 2022-03-14T18:50:24.6871803Z Mar 14 18:50:24 
> test_stream_sink.set_parallelism(4)
> 2022-03-14T18:50:24.6872291Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6872756Z Mar 14 18:50:24 archive_dir_path = 
> os.path.join(self.tempdir, "archive_" + str(uuid.uuid4()))
> 2022-03-14T18:50:24.6873557Z Mar 14 18:50:24 
> os.mkdir(archive_dir_path)
> 2022-03-14T18:50:24.6874817Z Mar 14 18:50:24 with 
> open(os.path.join(archive_dir_path, "data.txt"), 'w') as f:
> 2022-03-14T18:50:24.6875414Z Mar 14 18:50:24 f.write("3")
> 2022-03-14T18:50:24.6875906Z Mar 14 

[jira] [Commented] (FLINK-35342) MaterializedTableStatementITCase test can check for wrong status

2024-05-16 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847025#comment-17847025
 ] 

Ryan Skraba commented on FLINK-35342:
-

I'm not entirely sure how this can be reproduced locally, but it can also be 
observed in GitHub actions if that helps!

* 1.20 Default (Java 8) / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9108427134/job/25040554694#step:10:12099
* 1.20 Default (Java 8) / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9105407291/job/25031170942#step:10:11841

> MaterializedTableStatementITCase test can check for wrong status
> 
>
> Key: FLINK-35342
> URL: https://issues.apache.org/jira/browse/FLINK-35342
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> * 1.20 AdaptiveScheduler / Test (module: table) 
> https://github.com/apache/flink/actions/runs/9056197319/job/24879135605#step:10:12490
>  
> It looks like 
> {{MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume}}
>  can be flaky, where the expected status is not yet RUNNING:
> {code}
> Error: 03:24:03 03:24:03.902 [ERROR] Tests run: 6, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 26.78 s <<< FAILURE! -- in 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase
> Error: 03:24:03 03:24:03.902 [ERROR] 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(Path,
>  RestClusterClient) -- Time elapsed: 3.850 s <<< FAILURE!
> May 13 03:24:03 org.opentest4j.AssertionFailedError: 
> May 13 03:24:03 
> May 13 03:24:03 expected: "RUNNING"
> May 13 03:24:03  but was: "CREATED"
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> May 13 03:24:03   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> May 13 03:24:03   at 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(MaterializedTableStatementITCase.java:650)
> May 13 03:24:03   at java.lang.reflect.Method.invoke(Method.java:498)
> May 13 03:24:03   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> May 13 03:24:03 
> May 13 03:24:04 03:24:04.270 [INFO] 
> May 13 03:24:04 03:24:04.270 [INFO] Results:
> May 13 03:24:04 03:24:04.270 [INFO] 
> Error: 03:24:04 03:24:04.270 [ERROR] Failures: 
> Error: 03:24:04 03:24:04.271 [ERROR]   
> MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume:650
>  
> May 13 03:24:04 expected: "RUNNING"
> May 13 03:24:04  but was: "CREATED"
> May 13 03:24:04 03:24:04.271 [INFO] 
> Error: 03:24:04 03:24:04.271 [ERROR] Tests run: 82, Failures: 1, Errors: 0, 
> Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35380) ResumeCheckpointManuallyITCase hanging on tests

2024-05-16 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847022#comment-17847022
 ] 

Ryan Skraba commented on FLINK-35380:
-

* 1.20 Default (Java 8) / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9108427134/job/25040554694#step:10:12099

> ResumeCheckpointManuallyITCase hanging on tests 
> 
>
> Key: FLINK-35380
> URL: https://issues.apache.org/jira/browse/FLINK-35380
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> * 1.20 Default (Java 8) / Test (module: tests) 
> https://github.com/apache/flink/actions/runs/9105407291/job/25031170942#step:10:11841
>  
> (This is a slightly different error, waiting in a different place than 
> FLINK-28319)
> {code}
> May 16 03:23:58 
> ==
> May 16 03:23:58 Process produced no output for 900 seconds.
> May 16 03:23:58 
> ==
> ... snip until stack trace ...
> ay 16 03:23:58at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> May 16 03:23:58   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> May 16 03:23:58   at 
> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:410)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:378)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:318)
> May 16 03:23:58   at 
> org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFullRocksDBCheckpointsWithLocalRecoveryStandalone(ResumeCheckpointManuallyITCase.java:133)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35382) ChangelogCompatibilityITCase.testRestore fails with an NPE

2024-05-16 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847020#comment-17847020
 ] 

Ryan Skraba commented on FLINK-35382:
-

[~lijinzhong] Do you think this is related to the changes made in FLINK-32080?

> ChangelogCompatibilityITCase.testRestore fails with an NPE
> --
>
> Key: FLINK-35382
> URL: https://issues.apache.org/jira/browse/FLINK-35382
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> * 1.20 Java 8 / Test (module: tests) 
> https://github.com/apache/flink/actions/runs/9110398985/job/25045798401#step:10:8192
> It looks like there can be a [NullPointerException at this 
> line|https://github.com/apache/flink/blob/9a5a99b1a30054268bbde36d565cbb1b81018890/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/filemerging/FileMergingSnapshotManagerBase.java#L666]
>  causing a test failure:
> {code}
> Error: 10:36:23 10:36:23.312 [ERROR] Tests run: 9, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 19.31 s <<< FAILURE! -- in 
> org.apache.flink.test.state.ChangelogCompatibilityITCase
> Error: 10:36:23 10:36:23.313 [ERROR] 
> org.apache.flink.test.state.ChangelogCompatibilityITCase.testRestore[startWithChangelog=false,
>  restoreWithChangelog=true, restoreFrom=CHECKPOINT, allowStore=true, 
> allowRestore=true] -- Time elapsed: 1.492 s <<< ERROR!
> May 16 10:36:23 java.lang.RuntimeException: 
> org.opentest4j.AssertionFailedError: Graph is in globally terminal state 
> (FAILED)
> May 16 10:36:23   at 
> org.apache.flink.test.state.ChangelogCompatibilityITCase.tryRun(ChangelogCompatibilityITCase.java:204)
> May 16 10:36:23   at 
> org.apache.flink.test.state.ChangelogCompatibilityITCase.restoreAndValidate(ChangelogCompatibilityITCase.java:190)
> May 16 10:36:23   at java.util.Optional.ifPresent(Optional.java:159)
> May 16 10:36:23   at 
> org.apache.flink.test.state.ChangelogCompatibilityITCase.testRestore(ChangelogCompatibilityITCase.java:118)
> May 16 10:36:23   at java.lang.reflect.Method.invoke(Method.java:498)
> May 16 10:36:23 Caused by: org.opentest4j.AssertionFailedError: Graph is in 
> globally terminal state (FAILED)
> May 16 10:36:23   at 
> org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:42)
> May 16 10:36:23   at 
> org.junit.jupiter.api.Assertions.fail(Assertions.java:150)
> May 16 10:36:23   at 
> org.apache.flink.runtime.testutils.CommonTestUtils.lambda$waitForAllTaskRunning$3(CommonTestUtils.java:214)
> May 16 10:36:23   at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:151)
> May 16 10:36:23   at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145)
> May 16 10:36:23   at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitForAllTaskRunning(CommonTestUtils.java:209)
> May 16 10:36:23   at 
> org.apache.flink.runtime.testutils.CommonTestUtils.waitForAllTaskRunning(CommonTestUtils.java:182)
> May 16 10:36:23   at 
> org.apache.flink.test.state.ChangelogCompatibilityITCase.submit(ChangelogCompatibilityITCase.java:284)
> May 16 10:36:23   at 
> org.apache.flink.test.state.ChangelogCompatibilityITCase.tryRun(ChangelogCompatibilityITCase.java:197)
> May 16 10:36:23   ... 4 more
> May 16 10:36:23 Caused by: org.apache.flink.runtime.JobException: 
> org.apache.flink.runtime.JobException: Recovery is suppressed by 
> NoRestartBackoffTimeStrategy
> May 16 10:36:23   at 
> org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:219)
> May 16 10:36:23   at 
> org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailureAndReport(ExecutionFailureHandler.java:166)
> May 16 10:36:23   at 
> org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:121)
> May 16 10:36:23   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.recordTaskFailure(DefaultScheduler.java:279)
> May 16 10:36:23   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:270)
> May 16 10:36:23   at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.onTaskFailed(DefaultScheduler.java:263)
> May 16 10:36:23   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.onTaskExecutionStateUpdate(SchedulerBase.java:788)
> May 16 10:36:23   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:765)
> May 16 10:36:23   at 
> org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:83)
> May 16 10:36:23  

[jira] [Created] (FLINK-35382) ChangelogCompatibilityITCase.testRestore fails with an NPE

2024-05-16 Thread Ryan Skraba (Jira)
Ryan Skraba created FLINK-35382:
---

 Summary: ChangelogCompatibilityITCase.testRestore fails with an NPE
 Key: FLINK-35382
 URL: https://issues.apache.org/jira/browse/FLINK-35382
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.20.0
Reporter: Ryan Skraba


* 1.20 Java 8 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9110398985/job/25045798401#step:10:8192

It looks like there can be a [NullPointerException at this 
line|https://github.com/apache/flink/blob/9a5a99b1a30054268bbde36d565cbb1b81018890/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/filemerging/FileMergingSnapshotManagerBase.java#L666]
 causing a test failure:

{code}
Error: 10:36:23 10:36:23.312 [ERROR] Tests run: 9, Failures: 0, Errors: 1, 
Skipped: 0, Time elapsed: 19.31 s <<< FAILURE! -- in 
org.apache.flink.test.state.ChangelogCompatibilityITCase
Error: 10:36:23 10:36:23.313 [ERROR] 
org.apache.flink.test.state.ChangelogCompatibilityITCase.testRestore[startWithChangelog=false,
 restoreWithChangelog=true, restoreFrom=CHECKPOINT, allowStore=true, 
allowRestore=true] -- Time elapsed: 1.492 s <<< ERROR!
May 16 10:36:23 java.lang.RuntimeException: 
org.opentest4j.AssertionFailedError: Graph is in globally terminal state 
(FAILED)
May 16 10:36:23 at 
org.apache.flink.test.state.ChangelogCompatibilityITCase.tryRun(ChangelogCompatibilityITCase.java:204)
May 16 10:36:23 at 
org.apache.flink.test.state.ChangelogCompatibilityITCase.restoreAndValidate(ChangelogCompatibilityITCase.java:190)
May 16 10:36:23 at java.util.Optional.ifPresent(Optional.java:159)
May 16 10:36:23 at 
org.apache.flink.test.state.ChangelogCompatibilityITCase.testRestore(ChangelogCompatibilityITCase.java:118)
May 16 10:36:23 at java.lang.reflect.Method.invoke(Method.java:498)
May 16 10:36:23 Caused by: org.opentest4j.AssertionFailedError: Graph is in 
globally terminal state (FAILED)
May 16 10:36:23 at 
org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:42)
May 16 10:36:23 at 
org.junit.jupiter.api.Assertions.fail(Assertions.java:150)
May 16 10:36:23 at 
org.apache.flink.runtime.testutils.CommonTestUtils.lambda$waitForAllTaskRunning$3(CommonTestUtils.java:214)
May 16 10:36:23 at 
org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:151)
May 16 10:36:23 at 
org.apache.flink.runtime.testutils.CommonTestUtils.waitUntilCondition(CommonTestUtils.java:145)
May 16 10:36:23 at 
org.apache.flink.runtime.testutils.CommonTestUtils.waitForAllTaskRunning(CommonTestUtils.java:209)
May 16 10:36:23 at 
org.apache.flink.runtime.testutils.CommonTestUtils.waitForAllTaskRunning(CommonTestUtils.java:182)
May 16 10:36:23 at 
org.apache.flink.test.state.ChangelogCompatibilityITCase.submit(ChangelogCompatibilityITCase.java:284)
May 16 10:36:23 at 
org.apache.flink.test.state.ChangelogCompatibilityITCase.tryRun(ChangelogCompatibilityITCase.java:197)
May 16 10:36:23 ... 4 more
May 16 10:36:23 Caused by: org.apache.flink.runtime.JobException: 
org.apache.flink.runtime.JobException: Recovery is suppressed by 
NoRestartBackoffTimeStrategy
May 16 10:36:23 at 
org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:219)
May 16 10:36:23 at 
org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.handleFailureAndReport(ExecutionFailureHandler.java:166)
May 16 10:36:23 at 
org.apache.flink.runtime.executiongraph.failover.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:121)
May 16 10:36:23 at 
org.apache.flink.runtime.scheduler.DefaultScheduler.recordTaskFailure(DefaultScheduler.java:279)
May 16 10:36:23 at 
org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:270)
May 16 10:36:23 at 
org.apache.flink.runtime.scheduler.DefaultScheduler.onTaskFailed(DefaultScheduler.java:263)
May 16 10:36:23 at 
org.apache.flink.runtime.scheduler.SchedulerBase.onTaskExecutionStateUpdate(SchedulerBase.java:788)
May 16 10:36:23 at 
org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:765)
May 16 10:36:23 at 
org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:83)
May 16 10:36:23 at 
org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:496)
May 16 10:36:23 at java.lang.reflect.Method.invoke(Method.java:498)
May 16 10:36:23 at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRpcInvocation$1(PekkoRpcActor.java:318)
May 16 10:36:23 at 
org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
May 16 

[jira] [Created] (FLINK-35381) LocalRecoveryITCase failure on deleting directory

2024-05-16 Thread Ryan Skraba (Jira)
Ryan Skraba created FLINK-35381:
---

 Summary: LocalRecoveryITCase failure on deleting directory
 Key: FLINK-35381
 URL: https://issues.apache.org/jira/browse/FLINK-35381
 Project: Flink
  Issue Type: Bug
Reporter: Ryan Skraba


* 1.20 Java 11 / Test (module: tests) 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=54856=logs=a9db68b9-a7e0-54b6-0f98-010e0aff39e2=cdd32e0b-6047-565b-c58f-14054472f1be=11288
F

It looks like some resources in a subdirectory of a JUnit4 {{ClassRule}} temp 
directory prevent it from being cleaned up.  This was fixed in a different test 
in FLINK-33641.

{code}
SEVERE: Caught exception while closing extension context: 
org.junit.jupiter.engine.descriptor.MethodExtensionContext@2fc91366
java.io.IOException: Failed to delete temp directory 
/tmp/junit7935976901063386613. The following paths could not be deleted (see 
suppressed exceptions for details): 
tm_taskManager_0/localState/aid_1501e77149be2f931eab0a6c2e818f81/jid_fe61a39afa9873389353abb8bfbfba66/vtx_0a448493b4782967b150582570326227_sti_0,
 
tm_taskManager_0/localState/aid_1501e77149be2f931eab0a6c2e818f81/jid_fe61a39afa9873389353abb8bfbfba66/vtx_bc764cd8ddf7a0cff126f51c16239658_sti_0/chk_51
at 
org.junit.jupiter.engine.extension.TempDirectory$CloseablePath.createIOExceptionWithAttachedFailures(TempDirectory.java:431)
at 
org.junit.jupiter.engine.extension.TempDirectory$CloseablePath.close(TempDirectory.java:312)
at 
org.junit.jupiter.engine.descriptor.AbstractExtensionContext.lambda$static$0(AbstractExtensionContext.java:45)
at 
org.junit.platform.engine.support.store.NamespacedHierarchicalStore$EvaluatedValue.close(NamespacedHierarchicalStore.java:333)
at 
org.junit.platform.engine.support.store.NamespacedHierarchicalStore$EvaluatedValue.access$800(NamespacedHierarchicalStore.java:317)
at 
org.junit.platform.engine.support.store.NamespacedHierarchicalStore.lambda$close$3(NamespacedHierarchicalStore.java:98)
at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
org.junit.platform.engine.support.store.NamespacedHierarchicalStore.lambda$close$4(NamespacedHierarchicalStore.java:98)
at 
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
at 
java.base/java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:395)
at java.base/java.util.stream.Sink$ChainedReference.end(Sink.java:258)
at java.base/java.util.stream.Sink$ChainedReference.end(Sink.java:258)
at 
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485)
at 
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at 
java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at 
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at 
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
at 
org.junit.platform.engine.support.store.NamespacedHierarchicalStore.close(NamespacedHierarchicalStore.java:98)
at 
org.junit.jupiter.engine.descriptor.AbstractExtensionContext.close(AbstractExtensionContext.java:87)
at 
org.junit.jupiter.engine.execution.JupiterEngineExecutionContext.close(JupiterEngineExecutionContext.java:53)
at 
org.junit.jupiter.engine.descriptor.JupiterTestDescriptor.cleanUp(JupiterTestDescriptor.java:224)
at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$cleanUp$1(TestMethodTestDescriptor.java:156)
at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.cleanUp(TestMethodTestDescriptor.java:156)
at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.cleanUp(TestMethodTestDescriptor.java:69)
at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$cleanUp$10(NodeTestTask.java:167)
at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.cleanUp(NodeTestTask.java:167)
at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:98)
at 
org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService$ExclusiveTask.compute(ForkJoinPoolHierarchicalTestExecutorService.java:202)
at 

[jira] [Created] (FLINK-35380) ResumeCheckpointManuallyITCase hanging on tests

2024-05-16 Thread Ryan Skraba (Jira)
Ryan Skraba created FLINK-35380:
---

 Summary: ResumeCheckpointManuallyITCase hanging on tests 
 Key: FLINK-35380
 URL: https://issues.apache.org/jira/browse/FLINK-35380
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.20.0
Reporter: Ryan Skraba


* 1.20 Default (Java 8) / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9105407291/job/25031170942#step:10:11841
 

(This is a slightly different error, waiting in a different place than 
FLINK-28319)

{code}
May 16 03:23:58 
==
May 16 03:23:58 Process produced no output for 900 seconds.
May 16 03:23:58 
==

... snip until stack trace ...

ay 16 03:23:58  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
May 16 03:23:58 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
May 16 03:23:58 at 
java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
May 16 03:23:58 at 
org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.runJobAndGetExternalizedCheckpoint(ResumeCheckpointManuallyITCase.java:410)
May 16 03:23:58 at 
org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:378)
May 16 03:23:58 at 
org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedCheckpoints(ResumeCheckpointManuallyITCase.java:318)
May 16 03:23:58 at 
org.apache.flink.test.checkpointing.ResumeCheckpointManuallyITCase.testExternalizedFullRocksDBCheckpointsWithLocalRecoveryStandalone(ResumeCheckpointManuallyITCase.java:133)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35284) Streaming File Sink end-to-end test times out

2024-05-16 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846875#comment-17846875
 ] 

Ryan Skraba commented on FLINK-35284:
-

* 1.20 e2e_2_cron_adaptive_scheduler 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59583=logs=fb37c667-81b7-5c22-dd91-846535e99a97=011e961e-597c-5c96-04fe-7941c8b83f23=3098

> Streaming File Sink end-to-end test times out
> -
>
> Key: FLINK-35284
> URL: https://issues.apache.org/jira/browse/FLINK-35284
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Major
>  Labels: test-stability
>
> 1.20 e2e_2_cron_adaptive_scheduler 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59303=logs=fb37c667-81b7-5c22-dd91-846535e99a97=011e961e-597c-5c96-04fe-7941c8b83f23=3076
> {code}
> May 01 01:08:42 Test (pid: 127498) did not finish after 900 seconds.
> May 01 01:08:42 Printing Flink logs and killing it:
> {code}
> This looks like a consequence of hundreds of 
> {{RecipientUnreachableException}}s like: 
> {code}
> 2024-05-01 00:55:00,496 WARN  
> org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer 
> [] - Slot allocation for allocation 2ec550d8331cd53c32fd899e1e9a0fa5 for job 
> 5654b195450b352be998673f1637fc43 failed.
> org.apache.flink.runtime.rpc.exceptions.RecipientUnreachableException: Could 
> not send message [RemoteRpcInvocation(TaskExecutorGateway.requestSlot(SlotID, 
> JobID, AllocationID, ResourceProfile, String, ResourceManagerId, Time))] from 
> sender [Actor[pekko://flink/temp/taskmanager_0$De]] to recipient 
> [Actor[pekko.ssl.tcp://flink@localhost:40665/user/rpc/taskmanager_0#-299862847]],
>  because the recipient is unreachable. This can either mean that the 
> recipient has been terminated or that the remote RpcService is currently not 
> reachable.
>   at 
> org.apache.flink.runtime.rpc.pekko.DeadLettersActor.handleDeadLetter(DeadLettersActor.java:61)
>  ~[flink-rpc-akkafe85d469-8ced-4732-922e-62c82b554871.jar:1.20-SNAPSHOT]
>   at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) 
> ~[flink-rpc-akkafe85d469-8ced-4732-922e-62c82b554871.jar:1.20-SNAPSHOT]
>   at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) 
> ~[flink-rpc-akkafe85d469-8ced-4732-922e-62c82b554871.jar:1.20-SNAPSHOT]
>   at scala.PartialFunction.applyOrElse(PartialFunction.scala:127) 
> ~[flink-rpc-akkafe85d469-8ced-4732-922e-62c82b554871.jar:1.20-SNAPSHOT]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP

2024-05-16 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846874#comment-17846874
 ] 

Ryan Skraba commented on FLINK-33186:
-

* 1.18 AdaptiveScheduler / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9088951392/job/24979573762#step:10:7852

>  CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished 
> fails on AZP
> -
>
> Key: FLINK-33186
> URL: https://issues.apache.org/jira/browse/FLINK-33186
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.19.0, 1.18.1
>Reporter: Sergey Nuyanzin
>Assignee: Jiang Xin
>Priority: Critical
>  Labels: test-stability
>
> This build 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762
> fails as
> {noformat}
> Sep 28 01:23:43 Caused by: 
> org.apache.flink.runtime.checkpoint.CheckpointException: Task local 
> checkpoint failure.
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550)
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248)
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235)
> Sep 28 01:23:43   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817)
> Sep 28 01:23:43   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> Sep 28 01:23:43   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> Sep 28 01:23:43   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> Sep 28 01:23:43   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> Sep 28 01:23:43   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> Sep 28 01:23:43   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> Sep 28 01:23:43   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35342) MaterializedTableStatementITCase test can check for wrong status

2024-05-16 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846872#comment-17846872
 ] 

Ryan Skraba commented on FLINK-35342:
-

* 1.20 test_cron_adaptive_scheduler table 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59583=logs=f2c100be-250b-5e85-7bbe-176f68fcddc5=05efd11e-5400-54a4-0d27-a4663be008a9=12764
* 1.20 test_cron_adaptive_scheduler table 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59558=logs=f2c100be-250b-5e85-7bbe-176f68fcddc5=05efd11e-5400-54a4-0d27-a4663be008a9=12764

> MaterializedTableStatementITCase test can check for wrong status
> 
>
> Key: FLINK-35342
> URL: https://issues.apache.org/jira/browse/FLINK-35342
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> * 1.20 AdaptiveScheduler / Test (module: table) 
> https://github.com/apache/flink/actions/runs/9056197319/job/24879135605#step:10:12490
>  
> It looks like 
> {{MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume}}
>  can be flaky, where the expected status is not yet RUNNING:
> {code}
> Error: 03:24:03 03:24:03.902 [ERROR] Tests run: 6, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 26.78 s <<< FAILURE! -- in 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase
> Error: 03:24:03 03:24:03.902 [ERROR] 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(Path,
>  RestClusterClient) -- Time elapsed: 3.850 s <<< FAILURE!
> May 13 03:24:03 org.opentest4j.AssertionFailedError: 
> May 13 03:24:03 
> May 13 03:24:03 expected: "RUNNING"
> May 13 03:24:03  but was: "CREATED"
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> May 13 03:24:03   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> May 13 03:24:03   at 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(MaterializedTableStatementITCase.java:650)
> May 13 03:24:03   at java.lang.reflect.Method.invoke(Method.java:498)
> May 13 03:24:03   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> May 13 03:24:03 
> May 13 03:24:04 03:24:04.270 [INFO] 
> May 13 03:24:04 03:24:04.270 [INFO] Results:
> May 13 03:24:04 03:24:04.270 [INFO] 
> Error: 03:24:04 03:24:04.270 [ERROR] Failures: 
> Error: 03:24:04 03:24:04.271 [ERROR]   
> MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume:650
>  
> May 13 03:24:04 expected: "RUNNING"
> May 13 03:24:04  but was: "CREATED"
> May 13 03:24:04 03:24:04.271 [INFO] 
> Error: 03:24:04 03:24:04.271 [ERROR] Tests run: 82, Failures: 1, Errors: 0, 
> Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (FLINK-35342) MaterializedTableStatementITCase test can check for wrong status

2024-05-16 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846872#comment-17846872
 ] 

Ryan Skraba edited comment on FLINK-35342 at 5/16/24 8:40 AM:
--

It looks like a related problem is occurring with {{testDropMaterializedTable}}

* * 1.20 test_cron_adaptive_scheduler table 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59583=logs=f2c100be-250b-5e85-7bbe-176f68fcddc5=05efd11e-5400-54a4-0d27-a4663be008a9=12764
* 1.20 test_cron_adaptive_scheduler table 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59558=logs=f2c100be-250b-5e85-7bbe-176f68fcddc5=05efd11e-5400-54a4-0d27-a4663be008a9=12764


was (Author: ryanskraba):
* 1.20 test_cron_adaptive_scheduler table 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59583=logs=f2c100be-250b-5e85-7bbe-176f68fcddc5=05efd11e-5400-54a4-0d27-a4663be008a9=12764
* 1.20 test_cron_adaptive_scheduler table 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59558=logs=f2c100be-250b-5e85-7bbe-176f68fcddc5=05efd11e-5400-54a4-0d27-a4663be008a9=12764

> MaterializedTableStatementITCase test can check for wrong status
> 
>
> Key: FLINK-35342
> URL: https://issues.apache.org/jira/browse/FLINK-35342
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> * 1.20 AdaptiveScheduler / Test (module: table) 
> https://github.com/apache/flink/actions/runs/9056197319/job/24879135605#step:10:12490
>  
> It looks like 
> {{MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume}}
>  can be flaky, where the expected status is not yet RUNNING:
> {code}
> Error: 03:24:03 03:24:03.902 [ERROR] Tests run: 6, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 26.78 s <<< FAILURE! -- in 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase
> Error: 03:24:03 03:24:03.902 [ERROR] 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(Path,
>  RestClusterClient) -- Time elapsed: 3.850 s <<< FAILURE!
> May 13 03:24:03 org.opentest4j.AssertionFailedError: 
> May 13 03:24:03 
> May 13 03:24:03 expected: "RUNNING"
> May 13 03:24:03  but was: "CREATED"
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> May 13 03:24:03   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> May 13 03:24:03   at 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(MaterializedTableStatementITCase.java:650)
> May 13 03:24:03   at java.lang.reflect.Method.invoke(Method.java:498)
> May 13 03:24:03   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> May 13 03:24:03 
> May 13 03:24:04 03:24:04.270 [INFO] 
> May 13 03:24:04 03:24:04.270 [INFO] Results:
> May 13 03:24:04 03:24:04.270 [INFO] 
> Error: 03:24:04 03:24:04.270 [ERROR] Failures: 
> Error: 03:24:04 03:24:04.271 [ERROR]   
> MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume:650
>  
> May 13 03:24:04 expected: "RUNNING"
> May 13 03:24:04  but was: "CREATED"
> May 13 03:24:04 03:24:04.271 [INFO] 
> Error: 03:24:04 03:24:04.271 [ERROR] Tests run: 82, Failures: 1, Errors: 0, 
> Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34405) RightOuterJoinTaskTest#testCancelOuterJoinTaskWhileSort2 fails due to an interruption of the RightOuterJoinDriver#prepare method

2024-05-16 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846871#comment-17846871
 ] 

Ryan Skraba commented on FLINK-34405:
-

* 1.20 test_cron_adaptive_scheduler core 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59558=logs=0e7be18f-84f2-53f0-a32d-4a5e4a174679=7c1d86e3-35bd-5fd5-3b7c-30c126a78702=8662

> RightOuterJoinTaskTest#testCancelOuterJoinTaskWhileSort2 fails due to an 
> interruption of the RightOuterJoinDriver#prepare method
> 
>
> Key: FLINK-34405
> URL: https://issues.apache.org/jira/browse/FLINK-34405
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Critical
>  Labels: pull-request-available, starter, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57357=logs=d89de3df-4600-5585-dadc-9bbc9a5e661c=be5a4b15-4b23-56b1-7582-795f58a645a2=9027
> {code}
> Feb 07 03:20:16 03:20:16.223 [ERROR] Failures: 
> Feb 07 03:20:16 03:20:16.223 [ERROR] 
> org.apache.flink.runtime.operators.RightOuterJoinTaskTest.testCancelOuterJoinTaskWhileSort2
> Feb 07 03:20:16 03:20:16.223 [ERROR]   Run 1: 
> RightOuterJoinTaskTest>AbstractOuterJoinTaskTest.testCancelOuterJoinTaskWhileSort2:435
>  
> Feb 07 03:20:16 expected: 
> Feb 07 03:20:16   null
> Feb 07 03:20:16  but was: 
> Feb 07 03:20:16   java.lang.Exception: The data preparation caused an error: 
> Interrupted
> Feb 07 03:20:16   at 
> org.apache.flink.runtime.operators.testutils.BinaryOperatorTestBase.testDriverInternal(BinaryOperatorTestBase.java:209)
> Feb 07 03:20:16   at 
> org.apache.flink.runtime.operators.testutils.BinaryOperatorTestBase.testDriver(BinaryOperatorTestBase.java:189)
> Feb 07 03:20:16   at 
> org.apache.flink.runtime.operators.AbstractOuterJoinTaskTest.access$100(AbstractOuterJoinTaskTest.java:48)
> Feb 07 03:20:16   ...(1 remaining lines not displayed - this can be 
> changed with Assertions.setMaxStackTraceElementsDisplayed)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-25168) Azure failed due to unable to transfer maven artifacts

2024-05-16 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-25168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846870#comment-17846870
 ] 

Ryan Skraba commented on FLINK-25168:
-

* 1.19 cron_snapshot_deployment_maven 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59560=logs=eca6b3a6-1600-56cc-916a-c549b3cde3ff=7b3c1df5-9194-5183-5ebd-5567f52d5f8f=3018
* 1.20 cron_snapshot_deployment_maven 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59558=logs=eca6b3a6-1600-56cc-916a-c549b3cde3ff=7b3c1df5-9194-5183-5ebd-5567f52d5f8f=478

A slightly different error message but almost certainly the same transient 
connection problem between CI and the ASF repo.
{code}
00:22:04.734 [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-deploy-plugin:2.8.2:deploy (default-deploy) on 
project flink-annotations: Failed to deploy artifacts: Could not transfer 
artifact org.apache.flink:flink-annotations:jar:1.20-20240515.001805-85 from/to 
apache.snapshots.https 
(https://repository.apache.org/content/repositories/snapshots): transfer failed 
for 
https://repository.apache.org/content/repositories/snapshots/org/apache/flink/flink-annotations/1.20-SNAPSHOT/flink-annotations-1.20-20240515.001805-85.jar:
 Connect to repository.apache.org:443 [repository.apache.org/65.109.119.155] 
failed: Connection refused (Connection refused) -> [Help 1]
00:22:04.734 [ERROR] 
{code}

> Azure failed due to unable to transfer maven artifacts
> --
>
> Key: FLINK-25168
> URL: https://issues.apache.org/jira/browse/FLINK-25168
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines
>Affects Versions: 1.13.3, 1.15.0
>Reporter: Yun Gao
>Assignee: Chesnay Schepler
>Priority: Critical
>  Labels: test-stability
>
> {code:java}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-deploy-plugin:2.8.2:deploy (default-deploy) on 
> project flink-tests: Failed to deploy artifacts: Could not transfer artifact 
> org.apache.flink:flink-tests:jar:1.13-20211205.020632-728 from/to 
> apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots): Failed to 
> transfer file: 
> https://repository.apache.org/content/repositories/snapshots/org/apache/flink/flink-tests/1.13-SNAPSHOT/flink-tests-1.13-20211205.020632-728.jar.
>  Return code is: 502, ReasonPhrase: Proxy Error. -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> [ERROR] 
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :flink-tests
> ##[error]Bash exited with code '1'.
> Finishing: Deploy maven snapshot
>  {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=27560=logs=eca6b3a6-1600-56cc-916a-c549b3cde3ff=e9844b5e-5aa3-546b-6c3e-5395c7c0cac7=97156



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35342) MaterializedTableStatementITCase test can check for wrong status

2024-05-14 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846319#comment-17846319
 ] 

Ryan Skraba commented on FLINK-35342:
-

Thank you for the fix!

Just to be complete, this failure occurred before the fix was merged:

* 1.20 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/9072668322/job/24928769693#step:10:12490

> MaterializedTableStatementITCase test can check for wrong status
> 
>
> Key: FLINK-35342
> URL: https://issues.apache.org/jira/browse/FLINK-35342
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> * 1.20 AdaptiveScheduler / Test (module: table) 
> https://github.com/apache/flink/actions/runs/9056197319/job/24879135605#step:10:12490
>  
> It looks like 
> {{MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume}}
>  can be flaky, where the expected status is not yet RUNNING:
> {code}
> Error: 03:24:03 03:24:03.902 [ERROR] Tests run: 6, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 26.78 s <<< FAILURE! -- in 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase
> Error: 03:24:03 03:24:03.902 [ERROR] 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(Path,
>  RestClusterClient) -- Time elapsed: 3.850 s <<< FAILURE!
> May 13 03:24:03 org.opentest4j.AssertionFailedError: 
> May 13 03:24:03 
> May 13 03:24:03 expected: "RUNNING"
> May 13 03:24:03  but was: "CREATED"
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> May 13 03:24:03   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> May 13 03:24:03   at 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(MaterializedTableStatementITCase.java:650)
> May 13 03:24:03   at java.lang.reflect.Method.invoke(Method.java:498)
> May 13 03:24:03   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> May 13 03:24:03 
> May 13 03:24:04 03:24:04.270 [INFO] 
> May 13 03:24:04 03:24:04.270 [INFO] Results:
> May 13 03:24:04 03:24:04.270 [INFO] 
> Error: 03:24:04 03:24:04.270 [ERROR] Failures: 
> Error: 03:24:04 03:24:04.271 [ERROR]   
> MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume:650
>  
> May 13 03:24:04 expected: "RUNNING"
> May 13 03:24:04  but was: "CREATED"
> May 13 03:24:04 03:24:04.271 [INFO] 
> Error: 03:24:04 03:24:04.271 [ERROR] Tests run: 82, Failures: 1, Errors: 0, 
> Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35002) GitHub action request timeout to ArtifactService

2024-05-14 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846318#comment-17846318
 ] 

Ryan Skraba commented on FLINK-35002:
-

* 1.18 Default (Java 8) / Test (module: tests) 
https://github.com/apache/flink/commit/1f604da2dfc831d04826a20b3cb272d2ad9dfb56/checks/24935906143/logs

> GitHub action request timeout  to ArtifactService
> -
>
> Key: FLINK-35002
> URL: https://issues.apache.org/jira/browse/FLINK-35002
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Reporter: Ryan Skraba
>Priority: Major
>  Labels: github-actions, test-stability
>
> A timeout can occur when uploading a successfully built artifact:
>  * [https://github.com/apache/flink/actions/runs/8516411871/job/23325392650]
> {code:java}
> 2024-04-02T02:20:15.6355368Z With the provided path, there will be 1 file 
> uploaded
> 2024-04-02T02:20:15.6360133Z Artifact name is valid!
> 2024-04-02T02:20:15.6362872Z Root directory input is valid!
> 2024-04-02T02:20:20.6975036Z Attempt 1 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 3000 ms...
> 2024-04-02T02:20:28.7084937Z Attempt 2 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 4785 ms...
> 2024-04-02T02:20:38.5015936Z Attempt 3 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 7375 ms...
> 2024-04-02T02:20:50.8901508Z Attempt 4 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 14988 ms...
> 2024-04-02T02:21:10.9028438Z ##[error]Failed to CreateArtifact: Failed to 
> make request after 5 attempts: Request timeout: 
> /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact
> 2024-04-02T02:22:59.9893296Z Post job cleanup.
> 2024-04-02T02:22:59.9958844Z Post job cleanup. {code}
> (This is unlikely to be something we can fix, but we can track it.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35342) MaterializedTableStatementITCase test can check for wrong status

2024-05-13 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846007#comment-17846007
 ] 

Ryan Skraba commented on FLINK-35342:
-

[~hackergin] I found this failure in this new test, from a few days ago.  Do 
you think it's likely to happen again?

> MaterializedTableStatementITCase test can check for wrong status
> 
>
> Key: FLINK-35342
> URL: https://issues.apache.org/jira/browse/FLINK-35342
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> * 1.20 AdaptiveScheduler / Test (module: table) 
> https://github.com/apache/flink/actions/runs/9056197319/job/24879135605#step:10:12490
>  
> It looks like 
> {{MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume}}
>  can be flaky, where the expected status is not yet RUNNING:
> {code}
> Error: 03:24:03 03:24:03.902 [ERROR] Tests run: 6, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 26.78 s <<< FAILURE! -- in 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase
> Error: 03:24:03 03:24:03.902 [ERROR] 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(Path,
>  RestClusterClient) -- Time elapsed: 3.850 s <<< FAILURE!
> May 13 03:24:03 org.opentest4j.AssertionFailedError: 
> May 13 03:24:03 
> May 13 03:24:03 expected: "RUNNING"
> May 13 03:24:03  but was: "CREATED"
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> May 13 03:24:03   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> May 13 03:24:03   at 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(MaterializedTableStatementITCase.java:650)
> May 13 03:24:03   at java.lang.reflect.Method.invoke(Method.java:498)
> May 13 03:24:03   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> May 13 03:24:03 
> May 13 03:24:04 03:24:04.270 [INFO] 
> May 13 03:24:04 03:24:04.270 [INFO] Results:
> May 13 03:24:04 03:24:04.270 [INFO] 
> Error: 03:24:04 03:24:04.270 [ERROR] Failures: 
> Error: 03:24:04 03:24:04.271 [ERROR]   
> MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume:650
>  
> May 13 03:24:04 expected: "RUNNING"
> May 13 03:24:04  but was: "CREATED"
> May 13 03:24:04 03:24:04.271 [INFO] 
> Error: 03:24:04 03:24:04.271 [ERROR] Tests run: 82, Failures: 1, Errors: 0, 
> Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35342) MaterializedTableStatementITCase test can check for wrong status

2024-05-13 Thread Ryan Skraba (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Skraba updated FLINK-35342:

Affects Version/s: 1.20.0

> MaterializedTableStatementITCase test can check for wrong status
> 
>
> Key: FLINK-35342
> URL: https://issues.apache.org/jira/browse/FLINK-35342
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> * 1.20 AdaptiveScheduler / Test (module: table) 
> https://github.com/apache/flink/actions/runs/9056197319/job/24879135605#step:10:12490
>  
> It looks like 
> {{MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume}}
>  can be flaky, where the expected status is not yet RUNNING:
> {code}
> Error: 03:24:03 03:24:03.902 [ERROR] Tests run: 6, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 26.78 s <<< FAILURE! -- in 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase
> Error: 03:24:03 03:24:03.902 [ERROR] 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(Path,
>  RestClusterClient) -- Time elapsed: 3.850 s <<< FAILURE!
> May 13 03:24:03 org.opentest4j.AssertionFailedError: 
> May 13 03:24:03 
> May 13 03:24:03 expected: "RUNNING"
> May 13 03:24:03  but was: "CREATED"
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> May 13 03:24:03   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> May 13 03:24:03   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> May 13 03:24:03   at 
> org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(MaterializedTableStatementITCase.java:650)
> May 13 03:24:03   at java.lang.reflect.Method.invoke(Method.java:498)
> May 13 03:24:03   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> May 13 03:24:03   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> May 13 03:24:03 
> May 13 03:24:04 03:24:04.270 [INFO] 
> May 13 03:24:04 03:24:04.270 [INFO] Results:
> May 13 03:24:04 03:24:04.270 [INFO] 
> Error: 03:24:04 03:24:04.270 [ERROR] Failures: 
> Error: 03:24:04 03:24:04.271 [ERROR]   
> MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume:650
>  
> May 13 03:24:04 expected: "RUNNING"
> May 13 03:24:04  but was: "CREATED"
> May 13 03:24:04 03:24:04.271 [INFO] 
> Error: 03:24:04 03:24:04.271 [ERROR] Tests run: 82, Failures: 1, Errors: 0, 
> Skipped: 0
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35342) MaterializedTableStatementITCase test can check for wrong status

2024-05-13 Thread Ryan Skraba (Jira)
Ryan Skraba created FLINK-35342:
---

 Summary: MaterializedTableStatementITCase test can check for wrong 
status
 Key: FLINK-35342
 URL: https://issues.apache.org/jira/browse/FLINK-35342
 Project: Flink
  Issue Type: Bug
Reporter: Ryan Skraba


* 1.20 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/9056197319/job/24879135605#step:10:12490
 
It looks like 
{{MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume}} 
can be flaky, where the expected status is not yet RUNNING:

{code}
Error: 03:24:03 03:24:03.902 [ERROR] Tests run: 6, Failures: 1, Errors: 0, 
Skipped: 0, Time elapsed: 26.78 s <<< FAILURE! -- in 
org.apache.flink.table.gateway.service.MaterializedTableStatementITCase
Error: 03:24:03 03:24:03.902 [ERROR] 
org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(Path,
 RestClusterClient) -- Time elapsed: 3.850 s <<< FAILURE!
May 13 03:24:03 org.opentest4j.AssertionFailedError: 
May 13 03:24:03 
May 13 03:24:03 expected: "RUNNING"
May 13 03:24:03  but was: "CREATED"
May 13 03:24:03 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
May 13 03:24:03 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
May 13 03:24:03 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
May 13 03:24:03 at 
org.apache.flink.table.gateway.service.MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume(MaterializedTableStatementITCase.java:650)
May 13 03:24:03 at java.lang.reflect.Method.invoke(Method.java:498)
May 13 03:24:03 at 
java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
May 13 03:24:03 at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
May 13 03:24:03 at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
May 13 03:24:03 at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
May 13 03:24:03 at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
May 13 03:24:03 
May 13 03:24:04 03:24:04.270 [INFO] 
May 13 03:24:04 03:24:04.270 [INFO] Results:
May 13 03:24:04 03:24:04.270 [INFO] 
Error: 03:24:04 03:24:04.270 [ERROR] Failures: 
Error: 03:24:04 03:24:04.271 [ERROR]   
MaterializedTableStatementITCase.testAlterMaterializedTableSuspendAndResume:650 
May 13 03:24:04 expected: "RUNNING"
May 13 03:24:04  but was: "CREATED"
May 13 03:24:04 03:24:04.271 [INFO] 
Error: 03:24:04 03:24:04.271 [ERROR] Tests run: 82, Failures: 1, Errors: 0, 
Skipped: 0
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35335) StateCheckpointedITCase failed fatally with 127 exit code

2024-05-13 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845948#comment-17845948
 ] 

Ryan Skraba commented on FLINK-35335:
-

* 1.19 test_cron_adaptive_scheduler tests 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59499=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=8379

> StateCheckpointedITCase failed fatally with 127 exit code
> -
>
> Key: FLINK-35335
> URL: https://issues.apache.org/jira/browse/FLINK-35335
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.19.1
>Reporter: Ryan Skraba
>Priority: Major
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59499=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=8379
> {code}
> May 13 01:50:22 01:50:22.272 [INFO] Tests run: 6, Failures: 0, Errors: 0, 
> Skipped: 0, Time elapsed: 30.03 s -- in 
> org.apache.flink.test.streaming.runtime.CacheITCase
> May 13 01:50:23 01:50:23.142 [INFO] Tests run: 1, Failures: 0, Errors: 0, 
> Skipped: 0, Time elapsed: 5.234 s -- in 
> org.apache.flink.test.streaming.experimental.CollectITCase
> May 13 01:50:23 01:50:23.611 [INFO] 
> May 13 01:50:23 01:50:23.611 [INFO] Results:
> May 13 01:50:23 01:50:23.611 [INFO] 
> May 13 01:50:23 01:50:23.611 [WARNING] Tests run: 1960, Failures: 0, Errors: 
> 0, Skipped: 25
> May 13 01:50:23 01:50:23.611 [INFO] 
> May 13 01:50:23 01:50:23.674 [INFO] 
> 
> May 13 01:50:23 01:50:23.674 [INFO] BUILD FAILURE
> May 13 01:50:23 01:50:23.674 [INFO] 
> 
> May 13 01:50:23 01:50:23.676 [INFO] Total time:  41:24 min
> May 13 01:50:23 01:50:23.677 [INFO] Finished at: 2024-05-13T01:50:23Z
> May 13 01:50:23 01:50:23.677 [INFO] 
> 
> May 13 01:50:23 01:50:23.677 [WARNING] The requested profile 
> "skip-webui-build" could not be activated because it does not exist.
> May 13 01:50:23 01:50:23.678 [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:3.2.2:test (integration-tests) 
> on project flink-tests: 
> May 13 01:50:23 01:50:23.678 [ERROR] 
> May 13 01:50:23 01:50:23.678 [ERROR] Please refer to 
> /__w/2/s/flink-tests/target/surefire-reports for the individual test results.
> May 13 01:50:23 01:50:23.678 [ERROR] Please refer to dump files (if any 
> exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
> May 13 01:50:23 01:50:23.678 [ERROR] ExecutionException The forked VM 
> terminated without properly saying goodbye. VM crash or System.exit called?
> May 13 01:50:23 01:50:23.678 [ERROR] Command was /bin/sh -c cd 
> '/__w/2/s/flink-tests' && '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' 
> '-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' 
> '--add-opens=java.base/java.util=ALL-UNNAMED' 
> '--add-opens=java.base/java.io=ALL-UNNAMED' '-Xmx1536m' '-jar' 
> '/__w/2/s/flink-tests/target/surefire/surefirebooter-20240513010926195_686.jar'
>  '/__w/2/s/flink-tests/target/surefire' '2024-05-13T01-09-20_665-jvmRun1' 
> 'surefire-20240513010926195_684tmp' 'surefire_206-20240513010926195_685tmp'
> May 13 01:50:23 01:50:23.679 [ERROR] Error occurred in starting fork, check 
> output in log
> May 13 01:50:23 01:50:23.679 [ERROR] Process Exit Code: 127
> May 13 01:50:23 01:50:23.679 [ERROR] Crashed tests:
> May 13 01:50:23 01:50:23.679 [ERROR] 
> org.apache.flink.test.checkpointing.StateCheckpointedITCase
> May 13 01:50:23 01:50:23.679 [ERROR] 
> org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> May 13 01:50:23 01:50:23.679 [ERROR] Command was /bin/sh -c cd 
> '/__w/2/s/flink-tests' && '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' 
> '-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' 
> '--add-opens=java.base/java.util=ALL-UNNAMED' 
> '--add-opens=java.base/java.io=ALL-UNNAMED' '-Xmx1536m' '-jar' 
> '/__w/2/s/flink-tests/target/surefire/surefirebooter-20240513010926195_686.jar'
>  '/__w/2/s/flink-tests/target/surefire' '2024-05-13T01-09-20_665-jvmRun1' 
> 'surefire-20240513010926195_684tmp' 'surefire_206-20240513010926195_685tmp'
> May 13 01:50:23 01:50:23.679 [ERROR] Error occurred in starting fork, check 
> output in log
> May 13 01:50:23 01:50:23.679 [ERROR] Process Exit Code: 127
> May 13 01:50:23 01:50:23.679 [ERROR] Crashed tests:
> May 13 01:50:23 01:50:23.679 [ERROR] 
> org.apache.flink.test.checkpointing.StateCheckpointedITCase
> May 13 01:50:23 01:50:23.679 [ERROR]  at 
> 

[jira] [Commented] (FLINK-35254) build_wheels_on_macos failed

2024-05-13 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845946#comment-17845946
 ] 

Ryan Skraba commented on FLINK-35254:
-

* 1.20 build_wheels_on_macos 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59476=logs=f73b5736-8355-5390-ec71-4dfdec0ce6c5=90f7230e-bf5a-531b-8566-ad48d3e03bbb=426


> build_wheels_on_macos failed
> 
>
> Key: FLINK-35254
> URL: https://issues.apache.org/jira/browse/FLINK-35254
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Affects Versions: 1.20.0
>Reporter: Weijie Guo
>Priority: Major
>
> {code:java}
>  ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If 
> you have updated the package versions, please update the hashes. Otherwise, 
> examine the package contents carefully; someone may have tampered with them.
>   unknown package:
>   Expected sha256 
> f12932e5a6feb5c58192209af1d2607d488cb1d404fbc038ac12ada60327fa34
>Got
> 1c61bf307881167fe169de79c02f46d16fc5cd35781e02a40bf1f13671cdc22c
>   
>   [end of output]
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59219=logs=f73b5736-8355-5390-ec71-4dfdec0ce6c5=90f7230e-bf5a-531b-8566-ad48d3e03bbb=288



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35284) Streaming File Sink end-to-end test times out

2024-05-13 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845947#comment-17845947
 ] 

Ryan Skraba commented on FLINK-35284:
-

* 1.20 AdaptiveScheduler / E2E (group 2) 
https://github.com/apache/flink/actions/runs/9048112585/job/24860957143#step:14:3735
* 1.20 e2e_2_cron_adaptive_scheduler 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59498=logs=fb37c667-81b7-5c22-dd91-846535e99a97=011e961e-597c-5c96-04fe-7941c8b83f23=4404

> Streaming File Sink end-to-end test times out
> -
>
> Key: FLINK-35284
> URL: https://issues.apache.org/jira/browse/FLINK-35284
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Major
>  Labels: test-stability
>
> 1.20 e2e_2_cron_adaptive_scheduler 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59303=logs=fb37c667-81b7-5c22-dd91-846535e99a97=011e961e-597c-5c96-04fe-7941c8b83f23=3076
> {code}
> May 01 01:08:42 Test (pid: 127498) did not finish after 900 seconds.
> May 01 01:08:42 Printing Flink logs and killing it:
> {code}
> This looks like a consequence of hundreds of 
> {{RecipientUnreachableException}}s like: 
> {code}
> 2024-05-01 00:55:00,496 WARN  
> org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer 
> [] - Slot allocation for allocation 2ec550d8331cd53c32fd899e1e9a0fa5 for job 
> 5654b195450b352be998673f1637fc43 failed.
> org.apache.flink.runtime.rpc.exceptions.RecipientUnreachableException: Could 
> not send message [RemoteRpcInvocation(TaskExecutorGateway.requestSlot(SlotID, 
> JobID, AllocationID, ResourceProfile, String, ResourceManagerId, Time))] from 
> sender [Actor[pekko://flink/temp/taskmanager_0$De]] to recipient 
> [Actor[pekko.ssl.tcp://flink@localhost:40665/user/rpc/taskmanager_0#-299862847]],
>  because the recipient is unreachable. This can either mean that the 
> recipient has been terminated or that the remote RpcService is currently not 
> reachable.
>   at 
> org.apache.flink.runtime.rpc.pekko.DeadLettersActor.handleDeadLetter(DeadLettersActor.java:61)
>  ~[flink-rpc-akkafe85d469-8ced-4732-922e-62c82b554871.jar:1.20-SNAPSHOT]
>   at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) 
> ~[flink-rpc-akkafe85d469-8ced-4732-922e-62c82b554871.jar:1.20-SNAPSHOT]
>   at 
> org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) 
> ~[flink-rpc-akkafe85d469-8ced-4732-922e-62c82b554871.jar:1.20-SNAPSHOT]
>   at scala.PartialFunction.applyOrElse(PartialFunction.scala:127) 
> ~[flink-rpc-akkafe85d469-8ced-4732-922e-62c82b554871.jar:1.20-SNAPSHOT]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35041) IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration failed

2024-05-13 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845945#comment-17845945
 ] 

Ryan Skraba commented on FLINK-35041:
-

Thanks so much!  I've verified that I can no longer reproduce this error by 
repeatedly running the entire package of tests.

> IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration failed
> --
>
> Key: FLINK-35041
> URL: https://issues.apache.org/jira/browse/FLINK-35041
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Affects Versions: 1.20.0
>Reporter: Weijie Guo
>Assignee: Rui Fan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> {code:java}
> Apr 08 03:22:45 03:22:45.450 [ERROR] 
> org.apache.flink.runtime.state.IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration
>  -- Time elapsed: 0.034 s <<< FAILURE!
> Apr 08 03:22:45 org.opentest4j.AssertionFailedError: 
> Apr 08 03:22:45 
> Apr 08 03:22:45 expected: false
> Apr 08 03:22:45  but was: true
> Apr 08 03:22:45   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> Apr 08 03:22:45   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> Apr 08 03:22:45   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(K.java:45)
> Apr 08 03:22:45   at 
> org.apache.flink.runtime.state.DiscardRecordedStateObject.verifyDiscard(DiscardRecordedStateObject.java:34)
> Apr 08 03:22:45   at 
> org.apache.flink.runtime.state.IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration(IncrementalRemoteKeyedStateHandleTest.java:211)
> Apr 08 03:22:45   at java.lang.reflect.Method.invoke(Method.java:498)
> Apr 08 03:22:45   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> {code}
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58782=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef=9238]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35002) GitHub action request timeout to ArtifactService

2024-05-13 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845943#comment-17845943
 ] 

Ryan Skraba commented on FLINK-35002:
-

* 1.18 AdaptiveScheduler / Compile 
https://github.com/apache/flink/commit/09f7b070989a906d777a000e6ec3d9b45e192a29/checks/24844337742/logs


> GitHub action request timeout  to ArtifactService
> -
>
> Key: FLINK-35002
> URL: https://issues.apache.org/jira/browse/FLINK-35002
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Reporter: Ryan Skraba
>Priority: Major
>  Labels: github-actions, test-stability
>
> A timeout can occur when uploading a successfully built artifact:
>  * [https://github.com/apache/flink/actions/runs/8516411871/job/23325392650]
> {code:java}
> 2024-04-02T02:20:15.6355368Z With the provided path, there will be 1 file 
> uploaded
> 2024-04-02T02:20:15.6360133Z Artifact name is valid!
> 2024-04-02T02:20:15.6362872Z Root directory input is valid!
> 2024-04-02T02:20:20.6975036Z Attempt 1 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 3000 ms...
> 2024-04-02T02:20:28.7084937Z Attempt 2 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 4785 ms...
> 2024-04-02T02:20:38.5015936Z Attempt 3 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 7375 ms...
> 2024-04-02T02:20:50.8901508Z Attempt 4 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 14988 ms...
> 2024-04-02T02:21:10.9028438Z ##[error]Failed to CreateArtifact: Failed to 
> make request after 5 attempts: Request timeout: 
> /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact
> 2024-04-02T02:22:59.9893296Z Post job cleanup.
> 2024-04-02T02:22:59.9958844Z Post job cleanup. {code}
> (This is unlikely to be something we can fix, but we can track it.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34227) Job doesn't disconnect from ResourceManager

2024-05-13 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845942#comment-17845942
 ] 

Ryan Skraba commented on FLINK-34227:
-

* 1.19 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/9056197345/job/24878932830#step:10:14273
* 1.18 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/9056197329/job/24879136968#step:10:11976


> Job doesn't disconnect from ResourceManager
> ---
>
> Key: FLINK-34227
> URL: https://issues.apache.org/jira/browse/FLINK-34227
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.18.1
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: github-actions, pull-request-available, test-stability
> Attachments: FLINK-34227.7e7d69daebb438b8d03b7392c9c55115.log, 
> FLINK-34227.log
>
>
> https://github.com/XComp/flink/actions/runs/7634987973/job/20800205972#step:10:14557
> {code}
> [...]
> "main" #1 prio=5 os_prio=0 tid=0x7f4b7000 nid=0x24ec0 waiting on 
> condition [0x7fccce1eb000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xbdd52618> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2131)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2099)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2077)
>   at 
> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:876)
>   at 
> org.apache.flink.table.planner.runtime.stream.sql.WindowDistinctAggregateITCase.testHopWindow_Cube(WindowDistinctAggregateITCase.scala:550)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30644) ChangelogCompatibilityITCase.testRestore fails due to CheckpointCoordinator being shutdown

2024-05-13 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845940#comment-17845940
 ] 

Ryan Skraba commented on FLINK-30644:
-

* 1.19 test_cron_jdk17 tests 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59477=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=8264


> ChangelogCompatibilityITCase.testRestore fails due to CheckpointCoordinator 
> being shutdown
> --
>
> Key: FLINK-30644
> URL: https://issues.apache.org/jira/browse/FLINK-30644
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Runtime / State Backends
>Affects Versions: 1.17.0, 1.19.1
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: auto-deprioritized-critical, test-stability
>
> We observe a build failure in {{ChangelogCompatibilityITCase.testRestore}} 
> due to the {{CheckpointCoordinator}} being shut down:
> {code:java}
> [...]
> Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: 
> CheckpointCoordinator shutdown.
> Jan 12 02:37:37   at 
> org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:544)
> Jan 12 02:37:37   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2140)
> Jan 12 02:37:37   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2127)
> Jan 12 02:37:37   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoints(CheckpointCoordinator.java:2004)
> Jan 12 02:37:37   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoints(CheckpointCoordinator.java:1987)
> Jan 12 02:37:37   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingAndQueuedCheckpoints(CheckpointCoordinator.java:2183)
> Jan 12 02:37:37   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.shutdown(CheckpointCoordinator.java:426)
> Jan 12 02:37:37   at 
> org.apache.flink.runtime.executiongraph.DefaultExecutionGraph.onTerminalState(DefaultExecutionGraph.java:1329)
> [...]{code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44731=logs=2c3cbe13-dee0-5837-cf47-3053da9a8a78=b78d9d30-509a-5cea-1fef-db7abaa325ae=9255



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28440) EventTimeWindowCheckpointingITCase failed with restore

2024-05-13 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845939#comment-17845939
 ] 

Ryan Skraba commented on FLINK-28440:
-

* 1.18 Default (Java 8) / Test (module: tests) 
https://github.com/apache/flink/actions/runs/9029200531/job/24811449545#step:10:8625
* 1.19 test_cron_azure tests 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59499=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=0c010d0c-3dec-5bf1-d408-7b18988b1b2b=8120


> EventTimeWindowCheckpointingITCase failed with restore
> --
>
> Key: FLINK-28440
> URL: https://issues.apache.org/jira/browse/FLINK-28440
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / State Backends
>Affects Versions: 1.16.0, 1.17.0, 1.18.0, 1.19.0
>Reporter: Huang Xingbo
>Assignee: Yanfei Lei
>Priority: Critical
>  Labels: auto-deprioritized-critical, pull-request-available, 
> stale-assigned, test-stability
> Fix For: 1.20.0
>
> Attachments: image-2023-02-01-00-51-54-506.png, 
> image-2023-02-01-01-10-01-521.png, image-2023-02-01-01-19-12-182.png, 
> image-2023-02-01-16-47-23-756.png, image-2023-02-01-16-57-43-889.png, 
> image-2023-02-02-10-52-56-599.png, image-2023-02-03-10-09-07-586.png, 
> image-2023-02-03-12-03-16-155.png, image-2023-02-03-12-03-56-614.png
>
>
> {code:java}
> Caused by: java.lang.Exception: Exception while creating 
> StreamOperatorStateContext.
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:256)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:722)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:698)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:665)
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935)
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:904)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_0a448493b4782967b150582570326227_(2/4) from 
> any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:353)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:165)
>   ... 11 more
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
> /tmp/junit1835099326935900400/junit1113650082510421526/52ee65b7-033f-4429-8ddd-adbe85e27ced
>  (No such file or directory)
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.advance(StateChangelogHandleStreamHandleReader.java:87)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.hasNext(StateChangelogHandleStreamHandleReader.java:69)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.readBackendHandle(ChangelogBackendRestoreOperation.java:96)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.restore(ChangelogBackendRestoreOperation.java:75)
>   at 
> org.apache.flink.state.changelog.ChangelogStateBackend.restore(ChangelogStateBackend.java:92)
>   at 
> org.apache.flink.state.changelog.AbstractChangelogStateBackend.createKeyedStateBackend(AbstractChangelogStateBackend.java:136)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:336)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
>   at 
> 

[jira] [Commented] (FLINK-26644) python StreamExecutionEnvironmentTests.test_generate_stream_graph_with_dependencies failed on azure

2024-05-13 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845938#comment-17845938
 ] 

Ryan Skraba commented on FLINK-26644:
-

* 1.18 Java 11 / Test (module: python) 
https://github.com/apache/flink/actions/runs/9040330854/job/24844520768#step:10:24270

> python 
> StreamExecutionEnvironmentTests.test_generate_stream_graph_with_dependencies 
> failed on azure
> ---
>
> Key: FLINK-26644
> URL: https://issues.apache.org/jira/browse/FLINK-26644
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.14.4, 1.15.0, 1.16.0, 1.19.0
>Reporter: Yun Gao
>Priority: Minor
>  Labels: auto-deprioritized-major, test-stability
>
> {code:java}
> 2022-03-14T18:50:24.6842853Z Mar 14 18:50:24 
> === FAILURES 
> ===
> 2022-03-14T18:50:24.6844089Z Mar 14 18:50:24 _ 
> StreamExecutionEnvironmentTests.test_generate_stream_graph_with_dependencies _
> 2022-03-14T18:50:24.6844846Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6846063Z Mar 14 18:50:24 self = 
>   testMethod=test_generate_stream_graph_with_dependencies>
> 2022-03-14T18:50:24.6847104Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6847766Z Mar 14 18:50:24 def 
> test_generate_stream_graph_with_dependencies(self):
> 2022-03-14T18:50:24.6848677Z Mar 14 18:50:24 python_file_dir = 
> os.path.join(self.tempdir, "python_file_dir_" + str(uuid.uuid4()))
> 2022-03-14T18:50:24.6849833Z Mar 14 18:50:24 os.mkdir(python_file_dir)
> 2022-03-14T18:50:24.6850729Z Mar 14 18:50:24 python_file_path = 
> os.path.join(python_file_dir, "test_stream_dependency_manage_lib.py")
> 2022-03-14T18:50:24.6852679Z Mar 14 18:50:24 with 
> open(python_file_path, 'w') as f:
> 2022-03-14T18:50:24.6853646Z Mar 14 18:50:24 f.write("def 
> add_two(a):\nreturn a + 2")
> 2022-03-14T18:50:24.6854394Z Mar 14 18:50:24 env = self.env
> 2022-03-14T18:50:24.6855019Z Mar 14 18:50:24 
> env.add_python_file(python_file_path)
> 2022-03-14T18:50:24.6855519Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6856254Z Mar 14 18:50:24 def plus_two_map(value):
> 2022-03-14T18:50:24.6857045Z Mar 14 18:50:24 from 
> test_stream_dependency_manage_lib import add_two
> 2022-03-14T18:50:24.6857865Z Mar 14 18:50:24 return value[0], 
> add_two(value[1])
> 2022-03-14T18:50:24.6858466Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6858924Z Mar 14 18:50:24 def add_from_file(i):
> 2022-03-14T18:50:24.6859806Z Mar 14 18:50:24 with 
> open("data/data.txt", 'r') as f:
> 2022-03-14T18:50:24.6860266Z Mar 14 18:50:24 return i[0], 
> i[1] + int(f.read())
> 2022-03-14T18:50:24.6860879Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6862022Z Mar 14 18:50:24 from_collection_source = 
> env.from_collection([('a', 0), ('b', 0), ('c', 1), ('d', 1),
> 2022-03-14T18:50:24.6863259Z Mar 14 18:50:24  
>  ('e', 2)],
> 2022-03-14T18:50:24.6864057Z Mar 14 18:50:24  
> type_info=Types.ROW([Types.STRING(),
> 2022-03-14T18:50:24.6864651Z Mar 14 18:50:24  
>  Types.INT()]))
> 2022-03-14T18:50:24.6865150Z Mar 14 18:50:24 
> from_collection_source.name("From Collection")
> 2022-03-14T18:50:24.6866212Z Mar 14 18:50:24 keyed_stream = 
> from_collection_source.key_by(lambda x: x[1], key_type=Types.INT())
> 2022-03-14T18:50:24.6867083Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6867793Z Mar 14 18:50:24 plus_two_map_stream = 
> keyed_stream.map(plus_two_map).name("Plus Two Map").set_parallelism(3)
> 2022-03-14T18:50:24.6868620Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6869412Z Mar 14 18:50:24 add_from_file_map = 
> plus_two_map_stream.map(add_from_file).name("Add From File Map")
> 2022-03-14T18:50:24.6870239Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6870883Z Mar 14 18:50:24 test_stream_sink = 
> add_from_file_map.add_sink(self.test_sink).name("Test Sink")
> 2022-03-14T18:50:24.6871803Z Mar 14 18:50:24 
> test_stream_sink.set_parallelism(4)
> 2022-03-14T18:50:24.6872291Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6872756Z Mar 14 18:50:24 archive_dir_path = 
> os.path.join(self.tempdir, "archive_" + str(uuid.uuid4()))
> 2022-03-14T18:50:24.6873557Z Mar 14 18:50:24 
> os.mkdir(archive_dir_path)
> 2022-03-14T18:50:24.6874817Z Mar 14 18:50:24 with 
> open(os.path.join(archive_dir_path, "data.txt"), 'w') as f:
> 2022-03-14T18:50:24.6875414Z Mar 14 18:50:24 f.write("3")
> 2022-03-14T18:50:24.6875906Z Mar 14 

[jira] [Commented] (FLINK-23577) CoordinatedSourceRescaleITCase.testUpscaling fails with NoSuchFileException

2024-05-13 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-23577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845937#comment-17845937
 ] 

Ryan Skraba commented on FLINK-23577:
-

Looks like this flaky test has made a reappearance!
* 1.19 test_cron_adaptive_scheduler connect 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59455=logs=93ebd72a-004d-5a68-6295-7ace4ad889cd=35e92294-2840-51f1-1753-ae015c24c41f=10519

> CoordinatedSourceRescaleITCase.testUpscaling fails with NoSuchFileException
> ---
>
> Key: FLINK-23577
> URL: https://issues.apache.org/jira/browse/FLINK-23577
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.12.4
>Reporter: Xintong Song
>Priority: Major
>  Labels: test-stability
> Fix For: 1.12.8
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=21256=logs=298e20ef-7951-5965-0e79-ea664ddc435e=b4cd3436-dbe8-556d-3bca-42f92c3cbf2f=21306
> {code}
> [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.874 
> s <<< FAILURE! - in 
> org.apache.flink.connector.base.source.reader.CoordinatedSourceRescaleITCase
> [ERROR] 
> testUpscaling(org.apache.flink.connector.base.source.reader.CoordinatedSourceRescaleITCase)
>   Time elapsed: 5.32 s  <<< ERROR!
> java.io.UncheckedIOException: java.nio.file.NoSuchFileException: 
> /tmp/junit5156435599891303309/junit3268016245125781188/79604f102e69d25f3258a72a648dfdef/chk-8
>   at 
> java.base/java.nio.file.FileTreeIterator.fetchNextIfNeeded(FileTreeIterator.java:87)
>   at 
> java.base/java.nio.file.FileTreeIterator.hasNext(FileTreeIterator.java:103)
>   at java.base/java.util.Iterator.forEachRemaining(Iterator.java:132)
>   at 
> java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>   at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
>   at 
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
>   at 
> java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
>   at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.base/java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:558)
>   at 
> java.base/java.util.stream.ReferencePipeline.max(ReferencePipeline.java:594)
>   at 
> org.apache.flink.connector.base.source.reader.CoordinatedSourceRescaleITCase.generateCheckpoint(CoordinatedSourceRescaleITCase.java:83)
>   at 
> org.apache.flink.connector.base.source.reader.CoordinatedSourceRescaleITCase.testUpscaling(CoordinatedSourceRescaleITCase.java:70)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at 

[jira] [Commented] (FLINK-18476) PythonEnvUtilsTest#testStartPythonProcess fails

2024-05-13 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845936#comment-17845936
 ] 

Ryan Skraba commented on FLINK-18476:
-

* 1.19 test_cron_hadoop313 misc 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59477=logs=245e1f2e-ba5b-5570-d689-25ae21e5302f=d04c9862-880c-52f5-574b-a7a79fef8e0f=22042

> PythonEnvUtilsTest#testStartPythonProcess fails
> ---
>
> Key: FLINK-18476
> URL: https://issues.apache.org/jira/browse/FLINK-18476
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Tests
>Affects Versions: 1.11.0, 1.15.3, 1.18.0, 1.19.0, 1.20.0
>Reporter: Dawid Wysakowicz
>Priority: Major
>  Labels: auto-deprioritized-major, auto-deprioritized-minor, 
> test-stability
>
> The 
> {{org.apache.flink.client.python.PythonEnvUtilsTest#testStartPythonProcess}} 
> failed in my local environment as it assumes the environment has 
> {{/usr/bin/python}}. 
> I don't know exactly how did I get python in Ubuntu 20.04, but I have only 
> alias for {{python = python3}}. Therefore the tests fails.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35339) Compilation timeout while building flink-dist

2024-05-13 Thread Ryan Skraba (Jira)
Ryan Skraba created FLINK-35339:
---

 Summary: Compilation timeout while building flink-dist
 Key: FLINK-35339
 URL: https://issues.apache.org/jira/browse/FLINK-35339
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.19.1
Reporter: Ryan Skraba


* 1.19 Java 17 / Test (module: python) 
https://github.com/apache/flink/actions/runs/9040330904/job/24844527283#step:10:14325

The CI pipeline fails with:

{code}
May 11 02:44:25 Process exited with EXIT CODE: 143.
May 11 02:44:25 Trying to KILL watchdog (49546).
May 11 02:44:25 
==
May 11 02:44:25 Compilation failure detected, skipping test execution.
May 11 02:44:25 
==
{code}

It looks like this is due to a failed network connection while building 
src/assemblies/bin.xml :

{code}
May 11 02:44:25java.lang.Thread.State: RUNNABLE
May 11 02:44:25 at sun.nio.ch.Net.connect0(java.base@17.0.7/Native 
Method)
May 11 02:44:25 at sun.nio.ch.Net.connect(java.base@17.0.7/Net.java:579)
May 11 02:44:25 at sun.nio.ch.Net.connect(java.base@17.0.7/Net.java:568)
May 11 02:44:25 at 
sun.nio.ch.NioSocketImpl.connect(java.base@17.0.7/NioSocketImpl.java:588)
May 11 02:44:25 at 
java.net.SocksSocketImpl.connect(java.base@17.0.7/SocksSocketImpl.java:327)
May 11 02:44:25 at 
java.net.Socket.connect(java.base@17.0.7/Socket.java:633)
May 11 02:44:25 at 
org.apache.maven.wagon.providers.http.httpclient.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:368)
May 11 02:44:25 at 
org.apache.maven.wagon.providers.http.httpclient.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
May 11 02:44:25 at 
org.apache.maven.wagon.providers.http.httpclient.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)
May 11 02:44:25 at 
org.apache.maven.wagon.providers.http.httpclient.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
May 11 02:44:25 at 
org.apache.maven.wagon.providers.http.httpclient.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
May 11 02:44:25 at 
org.apache.maven.wagon.providers.http.httpclient.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
May 11 02:44:25 at 
org.apache.maven.wagon.providers.http.httpclient.impl.execchain.RetryExec.execute(RetryExec.java:89)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35335) StateCheckpointedITCase failed fatally with 127 exit code

2024-05-13 Thread Ryan Skraba (Jira)
Ryan Skraba created FLINK-35335:
---

 Summary: StateCheckpointedITCase failed fatally with 127 exit code
 Key: FLINK-35335
 URL: https://issues.apache.org/jira/browse/FLINK-35335
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.19.1
Reporter: Ryan Skraba


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59499=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=8379

{code}
May 13 01:50:22 01:50:22.272 [INFO] Tests run: 6, Failures: 0, Errors: 0, 
Skipped: 0, Time elapsed: 30.03 s -- in 
org.apache.flink.test.streaming.runtime.CacheITCase
May 13 01:50:23 01:50:23.142 [INFO] Tests run: 1, Failures: 0, Errors: 0, 
Skipped: 0, Time elapsed: 5.234 s -- in 
org.apache.flink.test.streaming.experimental.CollectITCase
May 13 01:50:23 01:50:23.611 [INFO] 
May 13 01:50:23 01:50:23.611 [INFO] Results:
May 13 01:50:23 01:50:23.611 [INFO] 
May 13 01:50:23 01:50:23.611 [WARNING] Tests run: 1960, Failures: 0, Errors: 0, 
Skipped: 25
May 13 01:50:23 01:50:23.611 [INFO] 
May 13 01:50:23 01:50:23.674 [INFO] 

May 13 01:50:23 01:50:23.674 [INFO] BUILD FAILURE
May 13 01:50:23 01:50:23.674 [INFO] 

May 13 01:50:23 01:50:23.676 [INFO] Total time:  41:24 min
May 13 01:50:23 01:50:23.677 [INFO] Finished at: 2024-05-13T01:50:23Z
May 13 01:50:23 01:50:23.677 [INFO] 

May 13 01:50:23 01:50:23.677 [WARNING] The requested profile "skip-webui-build" 
could not be activated because it does not exist.
May 13 01:50:23 01:50:23.678 [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:3.2.2:test (integration-tests) 
on project flink-tests: 
May 13 01:50:23 01:50:23.678 [ERROR] 
May 13 01:50:23 01:50:23.678 [ERROR] Please refer to 
/__w/2/s/flink-tests/target/surefire-reports for the individual test results.
May 13 01:50:23 01:50:23.678 [ERROR] Please refer to dump files (if any exist) 
[date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
May 13 01:50:23 01:50:23.678 [ERROR] ExecutionException The forked VM 
terminated without properly saying goodbye. VM crash or System.exit called?
May 13 01:50:23 01:50:23.678 [ERROR] Command was /bin/sh -c cd 
'/__w/2/s/flink-tests' && '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' 
'-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' 
'--add-opens=java.base/java.util=ALL-UNNAMED' 
'--add-opens=java.base/java.io=ALL-UNNAMED' '-Xmx1536m' '-jar' 
'/__w/2/s/flink-tests/target/surefire/surefirebooter-20240513010926195_686.jar' 
'/__w/2/s/flink-tests/target/surefire' '2024-05-13T01-09-20_665-jvmRun1' 
'surefire-20240513010926195_684tmp' 'surefire_206-20240513010926195_685tmp'
May 13 01:50:23 01:50:23.679 [ERROR] Error occurred in starting fork, check 
output in log
May 13 01:50:23 01:50:23.679 [ERROR] Process Exit Code: 127
May 13 01:50:23 01:50:23.679 [ERROR] Crashed tests:
May 13 01:50:23 01:50:23.679 [ERROR] 
org.apache.flink.test.checkpointing.StateCheckpointedITCase
May 13 01:50:23 01:50:23.679 [ERROR] 
org.apache.maven.surefire.booter.SurefireBooterForkException: 
ExecutionException The forked VM terminated without properly saying goodbye. VM 
crash or System.exit called?
May 13 01:50:23 01:50:23.679 [ERROR] Command was /bin/sh -c cd 
'/__w/2/s/flink-tests' && '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' 
'-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' 
'--add-opens=java.base/java.util=ALL-UNNAMED' 
'--add-opens=java.base/java.io=ALL-UNNAMED' '-Xmx1536m' '-jar' 
'/__w/2/s/flink-tests/target/surefire/surefirebooter-20240513010926195_686.jar' 
'/__w/2/s/flink-tests/target/surefire' '2024-05-13T01-09-20_665-jvmRun1' 
'surefire-20240513010926195_684tmp' 'surefire_206-20240513010926195_685tmp'
May 13 01:50:23 01:50:23.679 [ERROR] Error occurred in starting fork, check 
output in log
May 13 01:50:23 01:50:23.679 [ERROR] Process Exit Code: 127
May 13 01:50:23 01:50:23.679 [ERROR] Crashed tests:
May 13 01:50:23 01:50:23.679 [ERROR] 
org.apache.flink.test.checkpointing.StateCheckpointedITCase
May 13 01:50:23 01:50:23.679 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456)
May 13 01:50:23 01:50:23.679 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:418)
May 13 01:50:23 01:50:23.679 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:297)
May 13 01:50:23 01:50:23.679 [ERROR]at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:250)
{code}

In the maven logs, {{runCheckpointedProgram[FailoverStrategy: 
RestartPipelinedRegionFailoverStrategy]}} is started but never completes.




--
This 

[jira] [Commented] (FLINK-35306) Flink cannot compile with jdk17

2024-05-10 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845275#comment-17845275
 ] 

Ryan Skraba commented on FLINK-35306:
-

Thanks for the quick fix!  Just to document this, we saw the compilation fail 
on GitHub Actions too:
* 1.20 Java 17 / Compile 
https://github.com/apache/flink/commit/29736b8c01924b7da03d4bcbfd9c812a8e5a08b4/checks/24709533133/logs

> Flink cannot compile with jdk17
> ---
>
> Key: FLINK-35306
> URL: https://issues.apache.org/jira/browse/FLINK-35306
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI, Tests
>Affects Versions: 1.20.0
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.20.0
>
> Attachments: image-2024-05-08-11-48-04-161.png
>
>
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59380=results]
>  fails and benchmark with 17 fails as well
>  
> Reason: TypeSerializerUpgradeTestBase.UpgradeVerifier update the 
> schemaCompatibilityMatcher method name to schemaCompatibilityCondition, but 
> some subclasses didn't change it, such as: 
> PojoRecordSerializerUpgradeTestSpecifications.PojoToRecordVerifier.
>  
> It belongs to flink-tests-java17 module, and it doesn't compile by default.
>  
> it's caused by
>  * https://issues.apache.org/jira/browse/FLINK-25537
>  * [https://github.com/apache/flink/pull/24603]
>  
> !image-2024-05-08-11-48-04-161.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35041) IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration failed

2024-05-10 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845274#comment-17845274
 ] 

Ryan Skraba commented on FLINK-35041:
-

* 1.20 Hadoop 3.1.3 / Test (module: core) 
https://github.com/apache/flink/actions/runs/9026237714/job/24803537384#step:10:8419
* 1.20 Java 21 / Test (module: core) 
https://github.com/apache/flink/actions/runs/9011311875/job/24758973855#step:10:8334
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/actions/runs/8999811164/job/24723153060#step:10:8487
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/actions/runs/8997755665/job/24716975457#step:10:9046
* 1.20 Java 11 / Test (module: core) 
https://github.com/apache/flink/actions/runs/8995101420/job/24709819637#step:10:8738
* 1.20 Java 21 / Test (module: core) 
https://github.com/apache/flink/actions/runs/8995101420/job/24709801069#step:10:8940
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/actions/runs/8985327313/job/24679590248#step:10:8686


> IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration failed
> --
>
> Key: FLINK-35041
> URL: https://issues.apache.org/jira/browse/FLINK-35041
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Affects Versions: 1.20.0
>Reporter: Weijie Guo
>Assignee: Feifan Wang
>Priority: Blocker
>
> {code:java}
> Apr 08 03:22:45 03:22:45.450 [ERROR] 
> org.apache.flink.runtime.state.IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration
>  -- Time elapsed: 0.034 s <<< FAILURE!
> Apr 08 03:22:45 org.opentest4j.AssertionFailedError: 
> Apr 08 03:22:45 
> Apr 08 03:22:45 expected: false
> Apr 08 03:22:45  but was: true
> Apr 08 03:22:45   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> Apr 08 03:22:45   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> Apr 08 03:22:45   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(K.java:45)
> Apr 08 03:22:45   at 
> org.apache.flink.runtime.state.DiscardRecordedStateObject.verifyDiscard(DiscardRecordedStateObject.java:34)
> Apr 08 03:22:45   at 
> org.apache.flink.runtime.state.IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration(IncrementalRemoteKeyedStateHandleTest.java:211)
> Apr 08 03:22:45   at java.lang.reflect.Method.invoke(Method.java:498)
> Apr 08 03:22:45   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> {code}
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58782=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef=9238]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35002) GitHub action request timeout to ArtifactService

2024-05-10 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845273#comment-17845273
 ] 

Ryan Skraba commented on FLINK-35002:
-

* 1.19 Java 11 / Compile 
https://github.com/apache/flink/commit/fa426f104baa1343a07695dcf4c4984814f0fde4/checks/24803211419/logs
* 1.18 Java 11 / Test (module: connect) 
https://github.com/apache/flink/commit/9d0858ee745bc835efa78a34d849d5f3ecb89f6d/checks/24709868165/logs


> GitHub action request timeout  to ArtifactService
> -
>
> Key: FLINK-35002
> URL: https://issues.apache.org/jira/browse/FLINK-35002
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Reporter: Ryan Skraba
>Priority: Major
>  Labels: github-actions, test-stability
>
> A timeout can occur when uploading a successfully built artifact:
>  * [https://github.com/apache/flink/actions/runs/8516411871/job/23325392650]
> {code:java}
> 2024-04-02T02:20:15.6355368Z With the provided path, there will be 1 file 
> uploaded
> 2024-04-02T02:20:15.6360133Z Artifact name is valid!
> 2024-04-02T02:20:15.6362872Z Root directory input is valid!
> 2024-04-02T02:20:20.6975036Z Attempt 1 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 3000 ms...
> 2024-04-02T02:20:28.7084937Z Attempt 2 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 4785 ms...
> 2024-04-02T02:20:38.5015936Z Attempt 3 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 7375 ms...
> 2024-04-02T02:20:50.8901508Z Attempt 4 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 14988 ms...
> 2024-04-02T02:21:10.9028438Z ##[error]Failed to CreateArtifact: Failed to 
> make request after 5 attempts: Request timeout: 
> /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact
> 2024-04-02T02:22:59.9893296Z Post job cleanup.
> 2024-04-02T02:22:59.9958844Z Post job cleanup. {code}
> (This is unlikely to be something we can fix, but we can track it.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34404) GroupWindowAggregateProcTimeRestoreTest#testRestore times out

2024-05-10 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845271#comment-17845271
 ] 

Ryan Skraba commented on FLINK-34404:
-

(Using the same command line as above)
* 1.20 Default (Java 8) / Test (module: table) 
https://github.com/apache/flink/actions/runs/8999811164/job/24723153970#step:10:12716


> GroupWindowAggregateProcTimeRestoreTest#testRestore times out
> -
>
> Key: FLINK-34404
> URL: https://issues.apache.org/jira/browse/FLINK-34404
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.19.0, 1.20.0
>Reporter: Matthias Pohl
>Assignee: Alan Sheinberg
>Priority: Critical
>  Labels: test-stability
> Attachments: FLINK-34404.failure.log, FLINK-34404.success.log
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57357=logs=32715a4c-21b8-59a3-4171-744e5ab107eb=ff64056b-5320-5afe-c22c-6fa339e59586=11603
> {code}
> Feb 07 02:17:40 "ForkJoinPool-74-worker-1" #382 daemon prio=5 os_prio=0 
> cpu=282.22ms elapsed=961.78s tid=0x7f880a485c00 nid=0x6745 waiting on 
> condition  [0x7f878a6f9000]
> Feb 07 02:17:40java.lang.Thread.State: WAITING (parking)
> Feb 07 02:17:40   at 
> jdk.internal.misc.Unsafe.park(java.base@17.0.7/Native Method)
> Feb 07 02:17:40   - parking to wait for  <0xff73d060> (a 
> java.util.concurrent.CompletableFuture$Signaller)
> Feb 07 02:17:40   at 
> java.util.concurrent.locks.LockSupport.park(java.base@17.0.7/LockSupport.java:211)
> Feb 07 02:17:40   at 
> java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.7/CompletableFuture.java:1864)
> Feb 07 02:17:40   at 
> java.util.concurrent.ForkJoinPool.compensatedBlock(java.base@17.0.7/ForkJoinPool.java:3449)
> Feb 07 02:17:40   at 
> java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.7/ForkJoinPool.java:3432)
> Feb 07 02:17:40   at 
> java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.7/CompletableFuture.java:1898)
> Feb 07 02:17:40   at 
> java.util.concurrent.CompletableFuture.get(java.base@17.0.7/CompletableFuture.java:2072)
> Feb 07 02:17:40   at 
> org.apache.flink.table.planner.plan.nodes.exec.testutils.RestoreTestBase.testRestore(RestoreTestBase.java:292)
> Feb 07 02:17:40   at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@17.0.7/Native 
> Method)
> Feb 07 02:17:40   at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@17.0.7/NativeMethodAccessorImpl.java:77)
> Feb 07 02:17:40   at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@17.0.7/DelegatingMethodAccessorImpl.java:43)
> Feb 07 02:17:40   at 
> java.lang.reflect.Method.invoke(java.base@17.0.7/Method.java:568)
> Feb 07 02:17:40   at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34273) git fetch fails

2024-05-10 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845270#comment-17845270
 ] 

Ryan Skraba commented on FLINK-34273:
-

* 1.20 test_cron_hadoop313 connect 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59424=logs=b6f8a893-8f59-51d5-fe28-fb56a8b0932c=a2aa31b1-3076-5dd3-ea01-4a81e1467181=384
* 1.20 test_ci misc 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59405=logs=fc5181b0-e452-5c8f-68de-1097947f6483=10163a1a-ea71-5414-a832-7701bff37ba3=380


> git fetch fails
> ---
>
> Key: FLINK-34273
> URL: https://issues.apache.org/jira/browse/FLINK-34273
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI, Test Infrastructure
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: test-stability
>
> We've seen multiple {{git fetch}} failures. I assume this to be an 
> infrastructure issue. This Jira issue is for documentation purposes.
> {code:java}
> error: RPC failed; curl 18 transfer closed with outstanding read data 
> remaining
> error: 5211 bytes of body are still expected
> fetch-pack: unexpected disconnect while reading sideband packet
> fatal: early EOF
> fatal: fetch-pack: invalid index-pack output {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57080=logs=0e7be18f-84f2-53f0-a32d-4a5e4a174679=5d6dc3d3-393d-5111-3a40-c6a5a36202e6=667



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34224) ChangelogStorageMetricsTest.testAttemptsPerUpload(ChangelogStorageMetricsTest timed out

2024-05-10 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845269#comment-17845269
 ] 

Ryan Skraba commented on FLINK-34224:
-

* 1.18 Hadoop 3.1.3 / Test (module: core) 
https://github.com/apache/flink/actions/runs/9011311755/job/24759083100#step:10:10641

> ChangelogStorageMetricsTest.testAttemptsPerUpload(ChangelogStorageMetricsTest 
> timed out
> ---
>
> Key: FLINK-34224
> URL: https://issues.apache.org/jira/browse/FLINK-34224
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.19.0, 1.18.1
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions, test-stability
>
> The timeout appeared in the GitHub Actions workflow (currently in test phase; 
> [FLIP-396|https://cwiki.apache.org/confluence/display/FLINK/FLIP-396%3A+Trial+to+test+GitHub+Actions+as+an+alternative+for+Flink%27s+current+Azure+CI+infrastructure]):
> https://github.com/XComp/flink/actions/runs/7632434859/job/20793613726#step:10:11040
> {code}
> Jan 24 01:38:36 "ForkJoinPool-1-worker-1" #16 daemon prio=5 os_prio=0 
> tid=0x7f3b200ae800 nid=0x406e3 waiting on condition [0x7f3b1ba0e000]
> Jan 24 01:38:36java.lang.Thread.State: WAITING (parking)
> Jan 24 01:38:36   at sun.misc.Unsafe.park(Native Method)
> Jan 24 01:38:36   - parking to wait for  <0xdfbbb358> (a 
> java.util.concurrent.CompletableFuture$Signaller)
> Jan 24 01:38:36   at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> Jan 24 01:38:36   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
> Jan 24 01:38:36   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3313)
> Jan 24 01:38:36   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
> Jan 24 01:38:36   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> Jan 24 01:38:36   at 
> org.apache.flink.changelog.fs.ChangelogStorageMetricsTest.testAttemptsPerUpload(ChangelogStorageMetricsTest.java:251)
> Jan 24 01:38:36   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-26644) python StreamExecutionEnvironmentTests.test_generate_stream_graph_with_dependencies failed on azure

2024-05-10 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845268#comment-17845268
 ] 

Ryan Skraba commented on FLINK-26644:
-

* 1.20 Java 11 / Test (module: python) 
https://github.com/apache/flink/actions/runs/9011311875/job/24758993707#step:10:25072

> python 
> StreamExecutionEnvironmentTests.test_generate_stream_graph_with_dependencies 
> failed on azure
> ---
>
> Key: FLINK-26644
> URL: https://issues.apache.org/jira/browse/FLINK-26644
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.14.4, 1.15.0, 1.16.0, 1.19.0
>Reporter: Yun Gao
>Priority: Minor
>  Labels: auto-deprioritized-major, test-stability
>
> {code:java}
> 2022-03-14T18:50:24.6842853Z Mar 14 18:50:24 
> === FAILURES 
> ===
> 2022-03-14T18:50:24.6844089Z Mar 14 18:50:24 _ 
> StreamExecutionEnvironmentTests.test_generate_stream_graph_with_dependencies _
> 2022-03-14T18:50:24.6844846Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6846063Z Mar 14 18:50:24 self = 
>   testMethod=test_generate_stream_graph_with_dependencies>
> 2022-03-14T18:50:24.6847104Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6847766Z Mar 14 18:50:24 def 
> test_generate_stream_graph_with_dependencies(self):
> 2022-03-14T18:50:24.6848677Z Mar 14 18:50:24 python_file_dir = 
> os.path.join(self.tempdir, "python_file_dir_" + str(uuid.uuid4()))
> 2022-03-14T18:50:24.6849833Z Mar 14 18:50:24 os.mkdir(python_file_dir)
> 2022-03-14T18:50:24.6850729Z Mar 14 18:50:24 python_file_path = 
> os.path.join(python_file_dir, "test_stream_dependency_manage_lib.py")
> 2022-03-14T18:50:24.6852679Z Mar 14 18:50:24 with 
> open(python_file_path, 'w') as f:
> 2022-03-14T18:50:24.6853646Z Mar 14 18:50:24 f.write("def 
> add_two(a):\nreturn a + 2")
> 2022-03-14T18:50:24.6854394Z Mar 14 18:50:24 env = self.env
> 2022-03-14T18:50:24.6855019Z Mar 14 18:50:24 
> env.add_python_file(python_file_path)
> 2022-03-14T18:50:24.6855519Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6856254Z Mar 14 18:50:24 def plus_two_map(value):
> 2022-03-14T18:50:24.6857045Z Mar 14 18:50:24 from 
> test_stream_dependency_manage_lib import add_two
> 2022-03-14T18:50:24.6857865Z Mar 14 18:50:24 return value[0], 
> add_two(value[1])
> 2022-03-14T18:50:24.6858466Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6858924Z Mar 14 18:50:24 def add_from_file(i):
> 2022-03-14T18:50:24.6859806Z Mar 14 18:50:24 with 
> open("data/data.txt", 'r') as f:
> 2022-03-14T18:50:24.6860266Z Mar 14 18:50:24 return i[0], 
> i[1] + int(f.read())
> 2022-03-14T18:50:24.6860879Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6862022Z Mar 14 18:50:24 from_collection_source = 
> env.from_collection([('a', 0), ('b', 0), ('c', 1), ('d', 1),
> 2022-03-14T18:50:24.6863259Z Mar 14 18:50:24  
>  ('e', 2)],
> 2022-03-14T18:50:24.6864057Z Mar 14 18:50:24  
> type_info=Types.ROW([Types.STRING(),
> 2022-03-14T18:50:24.6864651Z Mar 14 18:50:24  
>  Types.INT()]))
> 2022-03-14T18:50:24.6865150Z Mar 14 18:50:24 
> from_collection_source.name("From Collection")
> 2022-03-14T18:50:24.6866212Z Mar 14 18:50:24 keyed_stream = 
> from_collection_source.key_by(lambda x: x[1], key_type=Types.INT())
> 2022-03-14T18:50:24.6867083Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6867793Z Mar 14 18:50:24 plus_two_map_stream = 
> keyed_stream.map(plus_two_map).name("Plus Two Map").set_parallelism(3)
> 2022-03-14T18:50:24.6868620Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6869412Z Mar 14 18:50:24 add_from_file_map = 
> plus_two_map_stream.map(add_from_file).name("Add From File Map")
> 2022-03-14T18:50:24.6870239Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6870883Z Mar 14 18:50:24 test_stream_sink = 
> add_from_file_map.add_sink(self.test_sink).name("Test Sink")
> 2022-03-14T18:50:24.6871803Z Mar 14 18:50:24 
> test_stream_sink.set_parallelism(4)
> 2022-03-14T18:50:24.6872291Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6872756Z Mar 14 18:50:24 archive_dir_path = 
> os.path.join(self.tempdir, "archive_" + str(uuid.uuid4()))
> 2022-03-14T18:50:24.6873557Z Mar 14 18:50:24 
> os.mkdir(archive_dir_path)
> 2022-03-14T18:50:24.6874817Z Mar 14 18:50:24 with 
> open(os.path.join(archive_dir_path, "data.txt"), 'w') as f:
> 2022-03-14T18:50:24.6875414Z Mar 14 18:50:24 f.write("3")
> 2022-03-14T18:50:24.6875906Z Mar 14 

[jira] [Commented] (FLINK-35041) IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration failed

2024-05-07 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844185#comment-17844185
 ] 

Ryan Skraba commented on FLINK-35041:
-

* 1.20 test_cron_jdk21 core 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59356=logs=d06b80b4-9e88-5d40-12a2-18072cf60528=609ecd5a-3f6e-5d0c-2239-2096b155a4d0=8870

> IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration failed
> --
>
> Key: FLINK-35041
> URL: https://issues.apache.org/jira/browse/FLINK-35041
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Affects Versions: 1.20.0
>Reporter: Weijie Guo
>Assignee: Feifan Wang
>Priority: Blocker
>
> {code:java}
> Apr 08 03:22:45 03:22:45.450 [ERROR] 
> org.apache.flink.runtime.state.IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration
>  -- Time elapsed: 0.034 s <<< FAILURE!
> Apr 08 03:22:45 org.opentest4j.AssertionFailedError: 
> Apr 08 03:22:45 
> Apr 08 03:22:45 expected: false
> Apr 08 03:22:45  but was: true
> Apr 08 03:22:45   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> Apr 08 03:22:45   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> Apr 08 03:22:45   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(K.java:45)
> Apr 08 03:22:45   at 
> org.apache.flink.runtime.state.DiscardRecordedStateObject.verifyDiscard(DiscardRecordedStateObject.java:34)
> Apr 08 03:22:45   at 
> org.apache.flink.runtime.state.IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration(IncrementalRemoteKeyedStateHandleTest.java:211)
> Apr 08 03:22:45   at java.lang.reflect.Method.invoke(Method.java:498)
> Apr 08 03:22:45   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> {code}
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58782=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef=9238]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35002) GitHub action request timeout to ArtifactService

2024-05-07 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844187#comment-17844187
 ] 

Ryan Skraba commented on FLINK-35002:
-

* 1.19 Java 8 / E2E (group 2) 
https://github.com/apache/flink/commit/fa426f104baa1343a07695dcf4c4984814f0fde4/checks/24659542455/logs

> GitHub action request timeout  to ArtifactService
> -
>
> Key: FLINK-35002
> URL: https://issues.apache.org/jira/browse/FLINK-35002
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Reporter: Ryan Skraba
>Priority: Major
>  Labels: github-actions, test-stability
>
> A timeout can occur when uploading a successfully built artifact:
>  * [https://github.com/apache/flink/actions/runs/8516411871/job/23325392650]
> {code:java}
> 2024-04-02T02:20:15.6355368Z With the provided path, there will be 1 file 
> uploaded
> 2024-04-02T02:20:15.6360133Z Artifact name is valid!
> 2024-04-02T02:20:15.6362872Z Root directory input is valid!
> 2024-04-02T02:20:20.6975036Z Attempt 1 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 3000 ms...
> 2024-04-02T02:20:28.7084937Z Attempt 2 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 4785 ms...
> 2024-04-02T02:20:38.5015936Z Attempt 3 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 7375 ms...
> 2024-04-02T02:20:50.8901508Z Attempt 4 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 14988 ms...
> 2024-04-02T02:21:10.9028438Z ##[error]Failed to CreateArtifact: Failed to 
> make request after 5 attempts: Request timeout: 
> /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact
> 2024-04-02T02:22:59.9893296Z Post job cleanup.
> 2024-04-02T02:22:59.9958844Z Post job cleanup. {code}
> (This is unlikely to be something we can fix, but we can track it.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34227) Job doesn't disconnect from ResourceManager

2024-05-06 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843671#comment-17843671
 ] 

Ryan Skraba commented on FLINK-34227:
-

* 1.18 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/8947210915/job/24579026057#step:10:14494


> Job doesn't disconnect from ResourceManager
> ---
>
> Key: FLINK-34227
> URL: https://issues.apache.org/jira/browse/FLINK-34227
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.18.1
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: github-actions, pull-request-available, test-stability
> Attachments: FLINK-34227.7e7d69daebb438b8d03b7392c9c55115.log, 
> FLINK-34227.log
>
>
> https://github.com/XComp/flink/actions/runs/7634987973/job/20800205972#step:10:14557
> {code}
> [...]
> "main" #1 prio=5 os_prio=0 tid=0x7f4b7000 nid=0x24ec0 waiting on 
> condition [0x7fccce1eb000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xbdd52618> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2131)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2099)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2077)
>   at 
> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:876)
>   at 
> org.apache.flink.table.planner.runtime.stream.sql.WindowDistinctAggregateITCase.testHopWindow_Cube(WindowDistinctAggregateITCase.scala:550)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-26644) python StreamExecutionEnvironmentTests.test_generate_stream_graph_with_dependencies failed on azure

2024-05-06 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843670#comment-17843670
 ] 

Ryan Skraba commented on FLINK-26644:
-

* 1.20 Java 8 / Test (module: python) 
https://github.com/apache/flink/actions/runs/8963069794/job/24613025524#step:10:25110


> python 
> StreamExecutionEnvironmentTests.test_generate_stream_graph_with_dependencies 
> failed on azure
> ---
>
> Key: FLINK-26644
> URL: https://issues.apache.org/jira/browse/FLINK-26644
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python
>Affects Versions: 1.14.4, 1.15.0, 1.16.0, 1.19.0
>Reporter: Yun Gao
>Priority: Minor
>  Labels: auto-deprioritized-major, test-stability
>
> {code:java}
> 2022-03-14T18:50:24.6842853Z Mar 14 18:50:24 
> === FAILURES 
> ===
> 2022-03-14T18:50:24.6844089Z Mar 14 18:50:24 _ 
> StreamExecutionEnvironmentTests.test_generate_stream_graph_with_dependencies _
> 2022-03-14T18:50:24.6844846Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6846063Z Mar 14 18:50:24 self = 
>   testMethod=test_generate_stream_graph_with_dependencies>
> 2022-03-14T18:50:24.6847104Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6847766Z Mar 14 18:50:24 def 
> test_generate_stream_graph_with_dependencies(self):
> 2022-03-14T18:50:24.6848677Z Mar 14 18:50:24 python_file_dir = 
> os.path.join(self.tempdir, "python_file_dir_" + str(uuid.uuid4()))
> 2022-03-14T18:50:24.6849833Z Mar 14 18:50:24 os.mkdir(python_file_dir)
> 2022-03-14T18:50:24.6850729Z Mar 14 18:50:24 python_file_path = 
> os.path.join(python_file_dir, "test_stream_dependency_manage_lib.py")
> 2022-03-14T18:50:24.6852679Z Mar 14 18:50:24 with 
> open(python_file_path, 'w') as f:
> 2022-03-14T18:50:24.6853646Z Mar 14 18:50:24 f.write("def 
> add_two(a):\nreturn a + 2")
> 2022-03-14T18:50:24.6854394Z Mar 14 18:50:24 env = self.env
> 2022-03-14T18:50:24.6855019Z Mar 14 18:50:24 
> env.add_python_file(python_file_path)
> 2022-03-14T18:50:24.6855519Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6856254Z Mar 14 18:50:24 def plus_two_map(value):
> 2022-03-14T18:50:24.6857045Z Mar 14 18:50:24 from 
> test_stream_dependency_manage_lib import add_two
> 2022-03-14T18:50:24.6857865Z Mar 14 18:50:24 return value[0], 
> add_two(value[1])
> 2022-03-14T18:50:24.6858466Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6858924Z Mar 14 18:50:24 def add_from_file(i):
> 2022-03-14T18:50:24.6859806Z Mar 14 18:50:24 with 
> open("data/data.txt", 'r') as f:
> 2022-03-14T18:50:24.6860266Z Mar 14 18:50:24 return i[0], 
> i[1] + int(f.read())
> 2022-03-14T18:50:24.6860879Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6862022Z Mar 14 18:50:24 from_collection_source = 
> env.from_collection([('a', 0), ('b', 0), ('c', 1), ('d', 1),
> 2022-03-14T18:50:24.6863259Z Mar 14 18:50:24  
>  ('e', 2)],
> 2022-03-14T18:50:24.6864057Z Mar 14 18:50:24  
> type_info=Types.ROW([Types.STRING(),
> 2022-03-14T18:50:24.6864651Z Mar 14 18:50:24  
>  Types.INT()]))
> 2022-03-14T18:50:24.6865150Z Mar 14 18:50:24 
> from_collection_source.name("From Collection")
> 2022-03-14T18:50:24.6866212Z Mar 14 18:50:24 keyed_stream = 
> from_collection_source.key_by(lambda x: x[1], key_type=Types.INT())
> 2022-03-14T18:50:24.6867083Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6867793Z Mar 14 18:50:24 plus_two_map_stream = 
> keyed_stream.map(plus_two_map).name("Plus Two Map").set_parallelism(3)
> 2022-03-14T18:50:24.6868620Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6869412Z Mar 14 18:50:24 add_from_file_map = 
> plus_two_map_stream.map(add_from_file).name("Add From File Map")
> 2022-03-14T18:50:24.6870239Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6870883Z Mar 14 18:50:24 test_stream_sink = 
> add_from_file_map.add_sink(self.test_sink).name("Test Sink")
> 2022-03-14T18:50:24.6871803Z Mar 14 18:50:24 
> test_stream_sink.set_parallelism(4)
> 2022-03-14T18:50:24.6872291Z Mar 14 18:50:24 
> 2022-03-14T18:50:24.6872756Z Mar 14 18:50:24 archive_dir_path = 
> os.path.join(self.tempdir, "archive_" + str(uuid.uuid4()))
> 2022-03-14T18:50:24.6873557Z Mar 14 18:50:24 
> os.mkdir(archive_dir_path)
> 2022-03-14T18:50:24.6874817Z Mar 14 18:50:24 with 
> open(os.path.join(archive_dir_path, "data.txt"), 'w') as f:
> 2022-03-14T18:50:24.6875414Z Mar 14 18:50:24 f.write("3")
> 2022-03-14T18:50:24.6875906Z Mar 14 

[jira] [Commented] (FLINK-18476) PythonEnvUtilsTest#testStartPythonProcess fails

2024-05-06 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843668#comment-17843668
 ] 

Ryan Skraba commented on FLINK-18476:
-

* 1.18 Java 17 / Test (module: misc) 
https://github.com/apache/flink/actions/runs/8954955161/job/24595336861#step:10:21745

> PythonEnvUtilsTest#testStartPythonProcess fails
> ---
>
> Key: FLINK-18476
> URL: https://issues.apache.org/jira/browse/FLINK-18476
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Tests
>Affects Versions: 1.11.0, 1.15.3, 1.18.0, 1.19.0, 1.20.0
>Reporter: Dawid Wysakowicz
>Priority: Major
>  Labels: auto-deprioritized-major, auto-deprioritized-minor, 
> test-stability
>
> The 
> {{org.apache.flink.client.python.PythonEnvUtilsTest#testStartPythonProcess}} 
> failed in my local environment as it assumes the environment has 
> {{/usr/bin/python}}. 
> I don't know exactly how did I get python in Ubuntu 20.04, but I have only 
> alias for {{python = python3}}. Therefore the tests fails.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35041) IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration failed

2024-05-06 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843669#comment-17843669
 ] 

Ryan Skraba commented on FLINK-35041:
-

* 1.20 Java 17 / Test (module: core) 
https://github.com/apache/flink/commit/beb0b167bdcf95f27be87a214a69a174fd49d256/checks/24613040802/logs
* 1.20 Java 11 / Test (module: core) 
https://github.com/apache/flink/actions/runs/8954955141/job/24595323637#step:10:7795
* 1.20 Hadoop 3.1.3 / Test (module: core) 
https://github.com/apache/flink/actions/runs/8954955141/job/24595381176#step:10:8858


> IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration failed
> --
>
> Key: FLINK-35041
> URL: https://issues.apache.org/jira/browse/FLINK-35041
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Affects Versions: 1.20.0
>Reporter: Weijie Guo
>Assignee: Feifan Wang
>Priority: Blocker
>
> {code:java}
> Apr 08 03:22:45 03:22:45.450 [ERROR] 
> org.apache.flink.runtime.state.IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration
>  -- Time elapsed: 0.034 s <<< FAILURE!
> Apr 08 03:22:45 org.opentest4j.AssertionFailedError: 
> Apr 08 03:22:45 
> Apr 08 03:22:45 expected: false
> Apr 08 03:22:45  but was: true
> Apr 08 03:22:45   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> Apr 08 03:22:45   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> Apr 08 03:22:45   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(K.java:45)
> Apr 08 03:22:45   at 
> org.apache.flink.runtime.state.DiscardRecordedStateObject.verifyDiscard(DiscardRecordedStateObject.java:34)
> Apr 08 03:22:45   at 
> org.apache.flink.runtime.state.IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration(IncrementalRemoteKeyedStateHandleTest.java:211)
> Apr 08 03:22:45   at java.lang.reflect.Method.invoke(Method.java:498)
> Apr 08 03:22:45   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> {code}
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58782=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef=9238]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-18476) PythonEnvUtilsTest#testStartPythonProcess fails

2024-05-03 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843261#comment-17843261
 ] 

Ryan Skraba commented on FLINK-18476:
-

* 1.20 Java 21 / Test (module: misc) 
https://github.com/apache/flink/actions/runs/221960/job/24404965886#step:10:22919

> PythonEnvUtilsTest#testStartPythonProcess fails
> ---
>
> Key: FLINK-18476
> URL: https://issues.apache.org/jira/browse/FLINK-18476
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Tests
>Affects Versions: 1.11.0, 1.15.3, 1.18.0, 1.19.0, 1.20.0
>Reporter: Dawid Wysakowicz
>Priority: Major
>  Labels: auto-deprioritized-major, auto-deprioritized-minor, 
> test-stability
>
> The 
> {{org.apache.flink.client.python.PythonEnvUtilsTest#testStartPythonProcess}} 
> failed in my local environment as it assumes the environment has 
> {{/usr/bin/python}}. 
> I don't know exactly how did I get python in Ubuntu 20.04, but I have only 
> alias for {{python = python3}}. Therefore the tests fails.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34227) Job doesn't disconnect from ResourceManager

2024-05-03 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843260#comment-17843260
 ] 

Ryan Skraba commented on FLINK-34227:
-

* 1.18 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/8904361381/job/24453748069#step:10:14980
* 1.18 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/8809948818/job/24181785187#step:10:17166


> Job doesn't disconnect from ResourceManager
> ---
>
> Key: FLINK-34227
> URL: https://issues.apache.org/jira/browse/FLINK-34227
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.18.1
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: github-actions, pull-request-available, test-stability
> Attachments: FLINK-34227.7e7d69daebb438b8d03b7392c9c55115.log, 
> FLINK-34227.log
>
>
> https://github.com/XComp/flink/actions/runs/7634987973/job/20800205972#step:10:14557
> {code}
> [...]
> "main" #1 prio=5 os_prio=0 tid=0x7f4b7000 nid=0x24ec0 waiting on 
> condition [0x7fccce1eb000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xbdd52618> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2131)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2099)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2077)
>   at 
> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:876)
>   at 
> org.apache.flink.table.planner.runtime.stream.sql.WindowDistinctAggregateITCase.testHopWindow_Cube(WindowDistinctAggregateITCase.scala:550)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35246) SqlClientSSLTest.testGatewayMode failed in AZP

2024-05-03 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843257#comment-17843257
 ] 

Ryan Skraba commented on FLINK-35246:
-

* 1.20 Java 17 / Test (module: table) 
https://github.com/apache/flink/actions/runs/8842083488/job/24280428940#step:10:12462
* 1.20 Java 21 / Test (module: table) 
https://github.com/apache/flink/actions/runs/8842083488/job/24280416340#step:10:12463

> SqlClientSSLTest.testGatewayMode failed in AZP
> --
>
> Key: FLINK-35246
> URL: https://issues.apache.org/jira/browse/FLINK-35246
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Reporter: Weijie Guo
>Assignee: Weijie Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> {code:java}
> Apr 26 01:51:10 java.lang.IllegalArgumentException: The given host:port 
> ('localhost/:36112') doesn't contain a valid port
> Apr 26 01:51:10   at 
> org.apache.flink.util.NetUtils.validateHostPortString(NetUtils.java:120)
> Apr 26 01:51:10   at 
> org.apache.flink.util.NetUtils.getCorrectHostnamePort(NetUtils.java:81)
> Apr 26 01:51:10   at 
> org.apache.flink.table.client.cli.CliOptionsParser.parseGatewayAddress(CliOptionsParser.java:325)
> Apr 26 01:51:10   at 
> org.apache.flink.table.client.cli.CliOptionsParser.parseGatewayModeClient(CliOptionsParser.java:296)
> Apr 26 01:51:10   at 
> org.apache.flink.table.client.SqlClient.startClient(SqlClient.java:207)
> Apr 26 01:51:10   at 
> org.apache.flink.table.client.SqlClientTestBase.runSqlClient(SqlClientTestBase.java:111)
> Apr 26 01:51:10   at 
> org.apache.flink.table.client.SqlClientSSLTest.testGatewayMode(SqlClientSSLTest.java:74)
> Apr 26 01:51:10   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:580)
> Apr 26 01:51:10   at 
> java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:194)
> Apr 26 01:51:10   at 
> java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
> Apr 26 01:51:10   at 
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1312)
> Apr 26 01:51:10   at 
> java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1843)
> Apr 26 01:51:10   at 
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1808)
> Apr 26 01:51:10   at 
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59173=logs=26b84117-e436-5720-913e-3e280ce55cae=77cc7e77-39a0-5007-6d65-4137ac13a471=12418



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35095) ExecutionEnvironmentImplTest.testFromSource failure on GitHub CI

2024-05-03 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843263#comment-17843263
 ] 

Ryan Skraba commented on FLINK-35095:
-

* 1.20 Java 21 / Test (module: misc) 
https://github.com/apache/flink/commit/80af4d502318348ba15a8f75a2a622ce9dbdc968/checks/24453751708/logs
* 1.20 Hadoop 3.1.3 / Test (module: misc) 
https://github.com/apache/flink/actions/runs/8809949034/job/24182253915#step:10:22352

> ExecutionEnvironmentImplTest.testFromSource failure on GitHub CI
> 
>
> Key: FLINK-35095
> URL: https://issues.apache.org/jira/browse/FLINK-35095
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
>
> 1.20 Java 17: Test (module: misc) 
> https://github.com/apache/flink/actions/runs/8655935935/job/23735920630#step:10:3
> {code}
> Error: 02:29:05 02:29:05.708 [ERROR] Tests run: 5, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 0.360 s <<< FAILURE! -- in 
> org.apache.flink.datastream.impl.ExecutionEnvironmentImplTest
> Error: 02:29:05 02:29:05.708 [ERROR] 
> org.apache.flink.datastream.impl.ExecutionEnvironmentImplTest.testFromSource 
> -- Time elapsed: 0.131 s <<< FAILURE!
> Apr 12 02:29:05 java.lang.AssertionError: 
> Apr 12 02:29:05 
> Apr 12 02:29:05 Expecting actual:
> Apr 12 02:29:05   [47]
> Apr 12 02:29:05 to contain exactly (and in same order):
> Apr 12 02:29:05   [48]
> Apr 12 02:29:05 but some elements were not found:
> Apr 12 02:29:05   [48]
> Apr 12 02:29:05 and others were not expected:
> Apr 12 02:29:05   [47]
> Apr 12 02:29:05 
> Apr 12 02:29:05   at 
> org.apache.flink.datastream.impl.ExecutionEnvironmentImplTest.testFromSource(ExecutionEnvironmentImplTest.java:97)
> Apr 12 02:29:05   at 
> java.base/java.lang.reflect.Method.invoke(Method.java:568)
> Apr 12 02:29:05   at 
> java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:194)
> Apr 12 02:29:05   at 
> java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
> Apr 12 02:29:05   at 
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
> Apr 12 02:29:05   at 
> java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
> Apr 12 02:29:05   at 
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
> Apr 12 02:29:05   at 
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
> Apr 12 02:29:05 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35002) GitHub action request timeout to ArtifactService

2024-05-03 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843262#comment-17843262
 ] 

Ryan Skraba commented on FLINK-35002:
-

* 1.19 AdaptiveScheduler / Compile 
https://github.com/apache/flink/commit/ac4aa35c6e2e2da87760ffbf45d85888b1976c2f/checks/24453516397/logs
* 1.20 Java 8 / Compile 
https://github.com/apache/flink/commit/e412402ca4dfc438e28fb990dc53ea7809430aee/checks/24356511040/logs
* 1.19 Java 8 / Test (module: table) 
https://github.com/apache/flink/commit/e7816f714ef5298e1ca978aeddf62732794bb93f/checks/24231189927/logs
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/actions/runs/8810747051/job/24183773837#step:14:31

> GitHub action request timeout  to ArtifactService
> -
>
> Key: FLINK-35002
> URL: https://issues.apache.org/jira/browse/FLINK-35002
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Reporter: Ryan Skraba
>Priority: Major
>  Labels: github-actions, test-stability
>
> A timeout can occur when uploading a successfully built artifact:
>  * [https://github.com/apache/flink/actions/runs/8516411871/job/23325392650]
> {code:java}
> 2024-04-02T02:20:15.6355368Z With the provided path, there will be 1 file 
> uploaded
> 2024-04-02T02:20:15.6360133Z Artifact name is valid!
> 2024-04-02T02:20:15.6362872Z Root directory input is valid!
> 2024-04-02T02:20:20.6975036Z Attempt 1 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 3000 ms...
> 2024-04-02T02:20:28.7084937Z Attempt 2 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 4785 ms...
> 2024-04-02T02:20:38.5015936Z Attempt 3 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 7375 ms...
> 2024-04-02T02:20:50.8901508Z Attempt 4 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 14988 ms...
> 2024-04-02T02:21:10.9028438Z ##[error]Failed to CreateArtifact: Failed to 
> make request after 5 attempts: Request timeout: 
> /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact
> 2024-04-02T02:22:59.9893296Z Post job cleanup.
> 2024-04-02T02:22:59.9958844Z Post job cleanup. {code}
> (This is unlikely to be something we can fix, but we can track it.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30644) ChangelogCompatibilityITCase.testRestore fails due to CheckpointCoordinator being shutdown

2024-05-03 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843259#comment-17843259
 ] 

Ryan Skraba commented on FLINK-30644:
-

* 1.20 Java 11 / Test (module: tests) 
https://github.com/apache/flink/actions/runs/8856547891/job/24323134209#step:10:7762

> ChangelogCompatibilityITCase.testRestore fails due to CheckpointCoordinator 
> being shutdown
> --
>
> Key: FLINK-30644
> URL: https://issues.apache.org/jira/browse/FLINK-30644
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Runtime / State Backends
>Affects Versions: 1.17.0, 1.19.1
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: auto-deprioritized-critical, test-stability
>
> We observe a build failure in {{ChangelogCompatibilityITCase.testRestore}} 
> due to the {{CheckpointCoordinator}} being shut down:
> {code:java}
> [...]
> Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: 
> CheckpointCoordinator shutdown.
> Jan 12 02:37:37   at 
> org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:544)
> Jan 12 02:37:37   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2140)
> Jan 12 02:37:37   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2127)
> Jan 12 02:37:37   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoints(CheckpointCoordinator.java:2004)
> Jan 12 02:37:37   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoints(CheckpointCoordinator.java:1987)
> Jan 12 02:37:37   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingAndQueuedCheckpoints(CheckpointCoordinator.java:2183)
> Jan 12 02:37:37   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.shutdown(CheckpointCoordinator.java:426)
> Jan 12 02:37:37   at 
> org.apache.flink.runtime.executiongraph.DefaultExecutionGraph.onTerminalState(DefaultExecutionGraph.java:1329)
> [...]{code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44731=logs=2c3cbe13-dee0-5837-cf47-3053da9a8a78=b78d9d30-509a-5cea-1fef-db7abaa325ae=9255



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28440) EventTimeWindowCheckpointingITCase failed with restore

2024-05-03 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843256#comment-17843256
 ] 

Ryan Skraba commented on FLINK-28440:
-

* 1.20 Default (Java 8) / Test (module: tests) 
https://github.com/apache/flink/actions/runs/8901164251/job/2807095#step:10:7971
* 1.20 Default (Java 8) / Test (module: tests) 
https://github.com/apache/flink/actions/runs/8887882381/job/24404087819#step:10:8262

> EventTimeWindowCheckpointingITCase failed with restore
> --
>
> Key: FLINK-28440
> URL: https://issues.apache.org/jira/browse/FLINK-28440
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / State Backends
>Affects Versions: 1.16.0, 1.17.0, 1.18.0, 1.19.0
>Reporter: Huang Xingbo
>Assignee: Yanfei Lei
>Priority: Critical
>  Labels: auto-deprioritized-critical, pull-request-available, 
> stale-assigned, test-stability
> Fix For: 1.20.0
>
> Attachments: image-2023-02-01-00-51-54-506.png, 
> image-2023-02-01-01-10-01-521.png, image-2023-02-01-01-19-12-182.png, 
> image-2023-02-01-16-47-23-756.png, image-2023-02-01-16-57-43-889.png, 
> image-2023-02-02-10-52-56-599.png, image-2023-02-03-10-09-07-586.png, 
> image-2023-02-03-12-03-16-155.png, image-2023-02-03-12-03-56-614.png
>
>
> {code:java}
> Caused by: java.lang.Exception: Exception while creating 
> StreamOperatorStateContext.
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:256)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268)
>   at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:722)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:698)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:665)
>   at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935)
>   at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:904)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for WindowOperator_0a448493b4782967b150582570326227_(2/4) from 
> any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:353)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:165)
>   ... 11 more
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
> /tmp/junit1835099326935900400/junit1113650082510421526/52ee65b7-033f-4429-8ddd-adbe85e27ced
>  (No such file or directory)
>   at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.advance(StateChangelogHandleStreamHandleReader.java:87)
>   at 
> org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.hasNext(StateChangelogHandleStreamHandleReader.java:69)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.readBackendHandle(ChangelogBackendRestoreOperation.java:96)
>   at 
> org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.restore(ChangelogBackendRestoreOperation.java:75)
>   at 
> org.apache.flink.state.changelog.ChangelogStateBackend.restore(ChangelogStateBackend.java:92)
>   at 
> org.apache.flink.state.changelog.AbstractChangelogStateBackend.createKeyedStateBackend(AbstractChangelogStateBackend.java:136)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:336)
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
>   at 
> 

[jira] [Commented] (FLINK-35041) IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration failed

2024-05-03 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843258#comment-17843258
 ] 

Ryan Skraba commented on FLINK-35041:
-

Going through some of the older GitHub actions from the last week, there are a 
lot of these:
 
* 1.20 Java 11 / Test (module: core) 
https://github.com/apache/flink/actions/runs/8917610620/job/24491172511#step:10:8154
* 1.20 Java 21 / Test (module: core) 
https://github.com/apache/flink/actions/runs/8917610620/job/24491154789#step:10:8873
* 1.20 Java 11 / Test (module: core) 
https://github.com/apache/flink/actions/runs/221960/job/24404966761#step:10:7787
* 1.20 AdaptiveScheduler / Test (module: core) 
https://github.com/apache/flink/actions/runs/221960/job/24404939797#step:10:8361
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/actions/runs/8874021289/job/24361049250#step:10:8308
* 1.20 Java 17 / Test (module: core) 
https://github.com/apache/flink/actions/runs/8872328953/job/24356752585#step:10:8911
* 1.20 Java 11 / Test (module: core) 
https://github.com/apache/flink/actions/runs/8864296312/job/24339779126#step:10:9083
* 1.20 Java 21 / Test (module: core) 
https://github.com/apache/flink/actions/runs/8856547891/job/24323115199#step:10:8933
* 1.20 Java 11 / Test (module: core) 
https://github.com/apache/flink/actions/runs/8842083488/job/24280420760#step:10:8265
* 1.20 Java 17 / Test (module: core) 
https://github.com/apache/flink/actions/runs/8825970497/job/24231219571#step:10:9087
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/actions/runs/8825652254/job/24230389260#step:10:9141
* 1.20 Java 21 / Test (module: core) 
https://github.com/apache/flink/actions/runs/8809949034/job/24182328046#step:10:8078
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/actions/runs/8800044378/job/24153034222#step:10:8261
* 1.20 Java 17 / Test (module: core) 
https://github.com/apache/flink/actions/runs/8793750647/job/24132431375#step:10:7754
* 1.20 Default (Java 8) / Test (module: core) 
https://github.com/apache/flink/actions/runs/8784906766/job/24104618074#step:10:8444


> IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration failed
> --
>
> Key: FLINK-35041
> URL: https://issues.apache.org/jira/browse/FLINK-35041
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Affects Versions: 1.20.0
>Reporter: Weijie Guo
>Assignee: Feifan Wang
>Priority: Blocker
>
> {code:java}
> Apr 08 03:22:45 03:22:45.450 [ERROR] 
> org.apache.flink.runtime.state.IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration
>  -- Time elapsed: 0.034 s <<< FAILURE!
> Apr 08 03:22:45 org.opentest4j.AssertionFailedError: 
> Apr 08 03:22:45 
> Apr 08 03:22:45 expected: false
> Apr 08 03:22:45  but was: true
> Apr 08 03:22:45   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> Apr 08 03:22:45   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> Apr 08 03:22:45   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(K.java:45)
> Apr 08 03:22:45   at 
> org.apache.flink.runtime.state.DiscardRecordedStateObject.verifyDiscard(DiscardRecordedStateObject.java:34)
> Apr 08 03:22:45   at 
> org.apache.flink.runtime.state.IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration(IncrementalRemoteKeyedStateHandleTest.java:211)
> Apr 08 03:22:45   at java.lang.reflect.Method.invoke(Method.java:498)
> Apr 08 03:22:45   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> {code}
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58782=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef=9238]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34645) StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.testFinishBundleTriggeredByCount fails

2024-05-03 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843255#comment-17843255
 ] 

Ryan Skraba commented on FLINK-34645:
-

1.18 Java 11 / Test (module: misc) 
https://github.com/apache/flink/actions/runs/8825970611/job/24231267277#step:10:21751

> StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.testFinishBundleTriggeredByCount
>  fails
> 
>
> Key: FLINK-34645
> URL: https://issues.apache.org/jira/browse/FLINK-34645
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.18.1
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions, test-stability
>
> {code}
> Error: 02:27:17 02:27:17.025 [ERROR] Tests run: 3, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 0.658 s <<< FAILURE! - in 
> org.apache.flink.table.runtime.operators.python.aggregate.arrow.stream.StreamArrowPythonGroupWindowAggregateFunctionOperatorTest
> Error: 02:27:17 02:27:17.025 [ERROR] 
> org.apache.flink.table.runtime.operators.python.aggregate.arrow.stream.StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.testFinishBundleTriggeredByCount
>   Time elapsed: 0.3 s  <<< FAILURE!
> Mar 09 02:27:17 java.lang.AssertionError: 
> Mar 09 02:27:17 
> Mar 09 02:27:17 Expected size: 8 but was: 6 in:
> Mar 09 02:27:17 [Record @ (undef) : 
> +I(c1,0,1969-12-31T23:59:55,1970-01-01T00:00:05),
> Mar 09 02:27:17 Record @ (undef) : 
> +I(c2,3,1969-12-31T23:59:55,1970-01-01T00:00:05),
> Mar 09 02:27:17 Record @ (undef) : 
> +I(c2,3,1970-01-01T00:00,1970-01-01T00:00:10),
> Mar 09 02:27:17 Record @ (undef) : 
> +I(c1,0,1970-01-01T00:00,1970-01-01T00:00:10),
> Mar 09 02:27:17 Watermark @ 1,
> Mar 09 02:27:17 Watermark @ 2]
> Mar 09 02:27:17   at 
> org.apache.flink.table.runtime.util.RowDataHarnessAssertor.assertOutputEquals(RowDataHarnessAssertor.java:110)
> Mar 09 02:27:17   at 
> org.apache.flink.table.runtime.util.RowDataHarnessAssertor.assertOutputEquals(RowDataHarnessAssertor.java:70)
> Mar 09 02:27:17   at 
> org.apache.flink.table.runtime.operators.python.aggregate.arrow.ArrowPythonAggregateFunctionOperatorTestBase.assertOutputEquals(ArrowPythonAggregateFunctionOperatorTestBase.java:62)
> Mar 09 02:27:17   at 
> org.apache.flink.table.runtime.operators.python.aggregate.arrow.stream.StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.testFinishBundleTriggeredByCount(StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.java:326)
> Mar 09 02:27:17   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35041) IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration failed

2024-05-02 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843018#comment-17843018
 ] 

Ryan Skraba commented on FLINK-35041:
-

1.20 test_cron_hadoop313 core 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59303=logs=d89de3df-4600-5585-dadc-9bbc9a5e661c=be5a4b15-4b23-56b1-7582-795f58a645a2=9001

> IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration failed
> --
>
> Key: FLINK-35041
> URL: https://issues.apache.org/jira/browse/FLINK-35041
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Affects Versions: 1.20.0
>Reporter: Weijie Guo
>Assignee: Feifan Wang
>Priority: Blocker
>
> {code:java}
> Apr 08 03:22:45 03:22:45.450 [ERROR] 
> org.apache.flink.runtime.state.IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration
>  -- Time elapsed: 0.034 s <<< FAILURE!
> Apr 08 03:22:45 org.opentest4j.AssertionFailedError: 
> Apr 08 03:22:45 
> Apr 08 03:22:45 expected: false
> Apr 08 03:22:45  but was: true
> Apr 08 03:22:45   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> Apr 08 03:22:45   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> Apr 08 03:22:45   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(K.java:45)
> Apr 08 03:22:45   at 
> org.apache.flink.runtime.state.DiscardRecordedStateObject.verifyDiscard(DiscardRecordedStateObject.java:34)
> Apr 08 03:22:45   at 
> org.apache.flink.runtime.state.IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration(IncrementalRemoteKeyedStateHandleTest.java:211)
> Apr 08 03:22:45   at java.lang.reflect.Method.invoke(Method.java:498)
> Apr 08 03:22:45   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> {code}
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58782=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef=9238]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34273) git fetch fails

2024-05-02 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843017#comment-17843017
 ] 

Ryan Skraba commented on FLINK-34273:
-

1.20 test_cron_adaptive_scheduler tests 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59303=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=bc77b88f-20e6-5fb3-ac3b-0b6efcca48c5=1068

> git fetch fails
> ---
>
> Key: FLINK-34273
> URL: https://issues.apache.org/jira/browse/FLINK-34273
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI, Test Infrastructure
>Affects Versions: 1.19.0, 1.18.1, 1.20.0
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: test-stability
>
> We've seen multiple {{git fetch}} failures. I assume this to be an 
> infrastructure issue. This Jira issue is for documentation purposes.
> {code:java}
> error: RPC failed; curl 18 transfer closed with outstanding read data 
> remaining
> error: 5211 bytes of body are still expected
> fetch-pack: unexpected disconnect while reading sideband packet
> fatal: early EOF
> fatal: fetch-pack: invalid index-pack output {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57080=logs=0e7be18f-84f2-53f0-a32d-4a5e4a174679=5d6dc3d3-393d-5111-3a40-c6a5a36202e6=667



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35284) Streaming File Sink end-to-end test times out

2024-05-02 Thread Ryan Skraba (Jira)
Ryan Skraba created FLINK-35284:
---

 Summary: Streaming File Sink end-to-end test times out
 Key: FLINK-35284
 URL: https://issues.apache.org/jira/browse/FLINK-35284
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.20.0
Reporter: Ryan Skraba


1.20 e2e_2_cron_adaptive_scheduler 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59303=logs=fb37c667-81b7-5c22-dd91-846535e99a97=011e961e-597c-5c96-04fe-7941c8b83f23=3076

{code}
May 01 01:08:42 Test (pid: 127498) did not finish after 900 seconds.
May 01 01:08:42 Printing Flink logs and killing it:
{code}

This looks like a consequence of hundreds of {{RecipientUnreachableException}}s 
like: 

{code}
2024-05-01 00:55:00,496 WARN  
org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer [] 
- Slot allocation for allocation 2ec550d8331cd53c32fd899e1e9a0fa5 for job 
5654b195450b352be998673f1637fc43 failed.
org.apache.flink.runtime.rpc.exceptions.RecipientUnreachableException: Could 
not send message [RemoteRpcInvocation(TaskExecutorGateway.requestSlot(SlotID, 
JobID, AllocationID, ResourceProfile, String, ResourceManagerId, Time))] from 
sender [Actor[pekko://flink/temp/taskmanager_0$De]] to recipient 
[Actor[pekko.ssl.tcp://flink@localhost:40665/user/rpc/taskmanager_0#-299862847]],
 because the recipient is unreachable. This can either mean that the recipient 
has been terminated or that the remote RpcService is currently not reachable.
at 
org.apache.flink.runtime.rpc.pekko.DeadLettersActor.handleDeadLetter(DeadLettersActor.java:61)
 ~[flink-rpc-akkafe85d469-8ced-4732-922e-62c82b554871.jar:1.20-SNAPSHOT]
at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) 
~[flink-rpc-akkafe85d469-8ced-4732-922e-62c82b554871.jar:1.20-SNAPSHOT]
at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) 
~[flink-rpc-akkafe85d469-8ced-4732-922e-62c82b554871.jar:1.20-SNAPSHOT]
at scala.PartialFunction.applyOrElse(PartialFunction.scala:127) 
~[flink-rpc-akkafe85d469-8ced-4732-922e-62c82b554871.jar:1.20-SNAPSHOT]
{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34645) StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.testFinishBundleTriggeredByCount fails

2024-04-30 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842430#comment-17842430
 ] 

Ryan Skraba commented on FLINK-34645:
-

1.18 Java 11 / Test (module: misc) 
https://github.com/apache/flink/actions/runs/8872328847/job/24356773170#step:10:21780

Again a slightly different output was received, but in the same test:
{code}
Apr 29 02:31:31 Expected size: 6 but was: 4 in:
Apr 29 02:31:31 [Record @ (undef) : 
+I(c1,0,1969-12-31T23:59:55,1970-01-01T00:00:05),
Apr 29 02:31:31 Record @ (undef) : 
+I(c1,0,1970-01-01T00:00,1970-01-01T00:00:10),
Apr 29 02:31:31 Record @ (undef) : 
+I(c1,1,1970-01-01T00:00:05,1970-01-01T00:00:15),
Apr 29 02:31:31 Record @ (undef) : 
+I(c1,2,1970-01-01T00:00:10,1970-01-01T00:00:20)]
Apr 29 02:31:31 at 
org.apache.flink.table.runtime.util.RowDataHarnessAssertor.assertOutputEquals(RowDataHarnessAssertor.java:110)
Apr 29 02:31:31 at 
org.apache.flink.table.runtime.util.RowDataHarnessAssertor.assertOutputEquals(RowDataHarnessAssertor.java:70)
Apr 29 02:31:31 at 
org.apache.flink.table.runtime.operators.python.aggregate.arrow.ArrowPythonAggregateFunctionOperatorTestBase.assertOutputEquals(ArrowPythonAggregateFunctionOperatorTestBase.java:62)
Apr 29 02:31:31 at 
org.apache.flink.table.runtime.operators.python.aggregate.arrow.batch.BatchArrowPythonGroupWindowAggregateFunctionOperatorTest.testFinishBundleTriggeredByCount(BatchArrowPythonGroupWindowAggregateFunctionOperatorTest.java:209)
{code}

> StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.testFinishBundleTriggeredByCount
>  fails
> 
>
> Key: FLINK-34645
> URL: https://issues.apache.org/jira/browse/FLINK-34645
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.18.1
>Reporter: Matthias Pohl
>Priority: Major
>  Labels: github-actions, test-stability
>
> {code}
> Error: 02:27:17 02:27:17.025 [ERROR] Tests run: 3, Failures: 1, Errors: 0, 
> Skipped: 0, Time elapsed: 0.658 s <<< FAILURE! - in 
> org.apache.flink.table.runtime.operators.python.aggregate.arrow.stream.StreamArrowPythonGroupWindowAggregateFunctionOperatorTest
> Error: 02:27:17 02:27:17.025 [ERROR] 
> org.apache.flink.table.runtime.operators.python.aggregate.arrow.stream.StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.testFinishBundleTriggeredByCount
>   Time elapsed: 0.3 s  <<< FAILURE!
> Mar 09 02:27:17 java.lang.AssertionError: 
> Mar 09 02:27:17 
> Mar 09 02:27:17 Expected size: 8 but was: 6 in:
> Mar 09 02:27:17 [Record @ (undef) : 
> +I(c1,0,1969-12-31T23:59:55,1970-01-01T00:00:05),
> Mar 09 02:27:17 Record @ (undef) : 
> +I(c2,3,1969-12-31T23:59:55,1970-01-01T00:00:05),
> Mar 09 02:27:17 Record @ (undef) : 
> +I(c2,3,1970-01-01T00:00,1970-01-01T00:00:10),
> Mar 09 02:27:17 Record @ (undef) : 
> +I(c1,0,1970-01-01T00:00,1970-01-01T00:00:10),
> Mar 09 02:27:17 Watermark @ 1,
> Mar 09 02:27:17 Watermark @ 2]
> Mar 09 02:27:17   at 
> org.apache.flink.table.runtime.util.RowDataHarnessAssertor.assertOutputEquals(RowDataHarnessAssertor.java:110)
> Mar 09 02:27:17   at 
> org.apache.flink.table.runtime.util.RowDataHarnessAssertor.assertOutputEquals(RowDataHarnessAssertor.java:70)
> Mar 09 02:27:17   at 
> org.apache.flink.table.runtime.operators.python.aggregate.arrow.ArrowPythonAggregateFunctionOperatorTestBase.assertOutputEquals(ArrowPythonAggregateFunctionOperatorTestBase.java:62)
> Mar 09 02:27:17   at 
> org.apache.flink.table.runtime.operators.python.aggregate.arrow.stream.StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.testFinishBundleTriggeredByCount(StreamArrowPythonGroupWindowAggregateFunctionOperatorTest.java:326)
> Mar 09 02:27:17   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-35276) SortCodeGeneratorTest.testMultiKeys fails on negative zero

2024-04-30 Thread Ryan Skraba (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Skraba updated FLINK-35276:

Attachment: job-logs.txt

> SortCodeGeneratorTest.testMultiKeys fails on negative zero
> --
>
> Key: FLINK-35276
> URL: https://issues.apache.org/jira/browse/FLINK-35276
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.20.0, 1.19.1
>Reporter: Ryan Skraba
>Priority: Critical
>  Labels: test-stability
> Attachments: job-logs.txt
>
>
> 1.19 AdaptiveScheduler / Test (module: table) 
> [https://github.com/apache/flink/actions/runs/8864296211/job/24339523745#step:10:10757]
> SortCodeGeneratorTest can fail if one of the generated random row values is 
> -0.0f.
> {code:java}
> Apr 28 02:38:03 expect: +I(,SqlRawValue{?},0.0,false); actual: 
> +I(,SqlRawValue{?},-0.0,false)
> Apr 28 02:38:03 expect: +I(,SqlRawValue{?},-0.0,false); actual: 
> +I(,SqlRawValue{?},0.0,false)
> ...
> 
> ...
> Apr 28 02:38:04 expect: +I(,null,4.9695407E17,false); actual: 
> +I(,null,4.9695407E17,false)
> Apr 28 02:38:04 expect: +I(,null,-3.84924672E18,false); actual: 
> +I(,null,-3.84924672E18,false)
> Apr 28 02:38:04 types: [[RAW('java.lang.Integer', ?), FLOAT, BOOLEAN]]
> Apr 28 02:38:04 keys: [0, 1]] 
> Apr 28 02:38:04 expected: 0.0f
> Apr 28 02:38:04  but was: -0.0f
> Apr 28 02:38:04   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> Apr 28 02:38:04   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> Apr 28 02:38:04   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> Apr 28 02:38:04   at 
> org.apache.flink.table.planner.codegen.SortCodeGeneratorTest.testInner(SortCodeGeneratorTest.java:632)
> Apr 28 02:38:04   at 
> org.apache.flink.table.planner.codegen.SortCodeGeneratorTest.testMultiKeys(SortCodeGeneratorTest.java:143)
> Apr 28 02:38:04   at java.lang.reflect.Method.invoke(Method.java:498)
> {code}
> In the test code, this is extremely unlikely to occur (one in 2²⁴?) but *has* 
> happened at this line (when the {{rnd.nextFloat()}} is {{0.0f}} and 
> {{rnd.nextLong()}} is negative:
> [https://github.com/apache/flink/blob/e7ce0a2969633168b9395c683921aa49362ad7a4/flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/codegen/SortCodeGeneratorTest.java#L255]
> We can reproduce the failure by changing how likely {{0.0f}} is to be 
> generated at that line.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-35276) SortCodeGeneratorTest.testMultiKeys fails on negative zero

2024-04-30 Thread Ryan Skraba (Jira)
Ryan Skraba created FLINK-35276:
---

 Summary: SortCodeGeneratorTest.testMultiKeys fails on negative zero
 Key: FLINK-35276
 URL: https://issues.apache.org/jira/browse/FLINK-35276
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.20.0, 1.19.1
Reporter: Ryan Skraba


1.19 AdaptiveScheduler / Test (module: table) 
[https://github.com/apache/flink/actions/runs/8864296211/job/24339523745#step:10:10757]

SortCodeGeneratorTest can fail if one of the generated random row values is 
-0.0f.
{code:java}
Apr 28 02:38:03 expect: +I(,SqlRawValue{?},0.0,false); actual: 
+I(,SqlRawValue{?},-0.0,false)
Apr 28 02:38:03 expect: +I(,SqlRawValue{?},-0.0,false); actual: 
+I(,SqlRawValue{?},0.0,false)
...

...
Apr 28 02:38:04 expect: +I(,null,4.9695407E17,false); actual: 
+I(,null,4.9695407E17,false)
Apr 28 02:38:04 expect: +I(,null,-3.84924672E18,false); actual: 
+I(,null,-3.84924672E18,false)
Apr 28 02:38:04 types: [[RAW('java.lang.Integer', ?), FLOAT, BOOLEAN]]
Apr 28 02:38:04 keys: [0, 1]] 
Apr 28 02:38:04 expected: 0.0f
Apr 28 02:38:04  but was: -0.0f
Apr 28 02:38:04 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Apr 28 02:38:04 at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
Apr 28 02:38:04 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
Apr 28 02:38:04 at 
org.apache.flink.table.planner.codegen.SortCodeGeneratorTest.testInner(SortCodeGeneratorTest.java:632)
Apr 28 02:38:04 at 
org.apache.flink.table.planner.codegen.SortCodeGeneratorTest.testMultiKeys(SortCodeGeneratorTest.java:143)
Apr 28 02:38:04 at java.lang.reflect.Method.invoke(Method.java:498)
{code}

In the test code, this is extremely unlikely to occur (one in 2²⁴?) but *has* 
happened at this line (when the {{rnd.nextFloat()}} is {{0.0f}} and 
{{rnd.nextLong()}} is negative:

[https://github.com/apache/flink/blob/e7ce0a2969633168b9395c683921aa49362ad7a4/flink-table/flink-table-planner/src/test/java/org/apache/flink/table/planner/codegen/SortCodeGeneratorTest.java#L255]

We can reproduce the failure by changing how likely {{0.0f}} is to be generated 
at that line.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35041) IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration failed

2024-04-30 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842292#comment-17842292
 ] 

Ryan Skraba commented on FLINK-35041:
-

1.20 test_ci_core 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59281=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=8885

> IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration failed
> --
>
> Key: FLINK-35041
> URL: https://issues.apache.org/jira/browse/FLINK-35041
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / CI
>Affects Versions: 1.20.0
>Reporter: Weijie Guo
>Assignee: Feifan Wang
>Priority: Blocker
>
> {code:java}
> Apr 08 03:22:45 03:22:45.450 [ERROR] 
> org.apache.flink.runtime.state.IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration
>  -- Time elapsed: 0.034 s <<< FAILURE!
> Apr 08 03:22:45 org.opentest4j.AssertionFailedError: 
> Apr 08 03:22:45 
> Apr 08 03:22:45 expected: false
> Apr 08 03:22:45  but was: true
> Apr 08 03:22:45   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> Apr 08 03:22:45   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> Apr 08 03:22:45   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(K.java:45)
> Apr 08 03:22:45   at 
> org.apache.flink.runtime.state.DiscardRecordedStateObject.verifyDiscard(DiscardRecordedStateObject.java:34)
> Apr 08 03:22:45   at 
> org.apache.flink.runtime.state.IncrementalRemoteKeyedStateHandleTest.testSharedStateReRegistration(IncrementalRemoteKeyedStateHandleTest.java:211)
> Apr 08 03:22:45   at java.lang.reflect.Method.invoke(Method.java:498)
> Apr 08 03:22:45   at 
> java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
> Apr 08 03:22:45   at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
> {code}
> [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58782=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef=9238]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-34227) Job doesn't disconnect from ResourceManager

2024-04-22 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839630#comment-17839630
 ] 

Ryan Skraba commented on FLINK-34227:
-

1.18 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/8769422951/job/24065034854#step:10:14503
1.20 AdaptiveScheduler / Test (module: table) 
https://github.com/apache/flink/actions/runs/8777471561/job/24082689462#step:10:13087

> Job doesn't disconnect from ResourceManager
> ---
>
> Key: FLINK-34227
> URL: https://issues.apache.org/jira/browse/FLINK-34227
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.19.0, 1.18.1
>Reporter: Matthias Pohl
>Assignee: Matthias Pohl
>Priority: Critical
>  Labels: github-actions, pull-request-available, test-stability
> Attachments: FLINK-34227.7e7d69daebb438b8d03b7392c9c55115.log, 
> FLINK-34227.log
>
>
> https://github.com/XComp/flink/actions/runs/7634987973/job/20800205972#step:10:14557
> {code}
> [...]
> "main" #1 prio=5 os_prio=0 tid=0x7f4b7000 nid=0x24ec0 waiting on 
> condition [0x7fccce1eb000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xbdd52618> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2131)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2099)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2077)
>   at 
> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:876)
>   at 
> org.apache.flink.table.planner.runtime.stream.sql.WindowDistinctAggregateITCase.testHopWindow_Cube(WindowDistinctAggregateITCase.scala:550)
> [...]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35175) HadoopDataInputStream can't compile with Hadoop 3.2.3

2024-04-22 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839628#comment-17839628
 ] 

Ryan Skraba commented on FLINK-35175:
-

Not including the fix: 
1.20 Hadoop 3.1.3 / Compile 
https://github.com/apache/flink/actions/runs/8747381080/job/24005737445#step:6:1560
1.20 Hadoop 3.1.3 / Compile 
https://github.com/apache/flink/actions/runs/8769422914/job/24064887346#step:6:1759


> HadoopDataInputStream can't compile with Hadoop 3.2.3
> -
>
> Key: FLINK-35175
> URL: https://issues.apache.org/jira/browse/FLINK-35175
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.20.0
>Reporter: Ryan Skraba
>Assignee: Hangxiang Yu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.20.0
>
>
> Unfortunately, introduced in FLINK-35045: 
> [PREADWRITEBUFFER|https://github.com/apache/flink/commit/a312a3bdd258e0ff7d6f94e979b32e2bc762b82f#diff-3ed57be01895ba0f792110e40f4283427c55528f11a5105b4bf34ebd4e6fef0dR182]
>  was added in Hadoop releases 
> [3.3.0|https://github.com/apache/hadoop/blob/rel/release-3.3.0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/StreamCapabilities.java#L72]
>  and 
> [2.10.0|https://github.com/apache/hadoop/blob/rel/release-2.10.0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/StreamCapabilities.java#L72].
> It doesn't exist in flink.hadoop.version 
> [3.2.3|https://github.com/apache/hadoop/blob/rel/release-3.2.3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/StreamCapabilities.java],
>  which we are using in end-to-end tests.
> {code:java}
> 00:23:55.093 [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile 
> (default-compile) on project flink-hadoop-fs: Compilation failure: 
> Compilation failure: 
> 00:23:55.093 [ERROR] 
> /home/vsts/work/1/s/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopDataInputStream.java:[151,63]
>  cannot find symbol
> 00:23:55.094 [ERROR]   symbol:   variable READBYTEBUFFER
> 00:23:55.094 [ERROR]   location: interface 
> org.apache.hadoop.fs.StreamCapabilities
> 00:23:55.094 [ERROR] 
> /home/vsts/work/1/s/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopDataInputStream.java:[182,63]
>  cannot find symbol
> 00:23:55.094 [ERROR]   symbol:   variable PREADBYTEBUFFER
> 00:23:55.094 [ERROR]   location: interface 
> org.apache.hadoop.fs.StreamCapabilities
> 00:23:55.094 [ERROR] 
> /home/vsts/work/1/s/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopDataInputStream.java:[183,43]
>  incompatible types: long cannot be converted to 
> org.apache.hadoop.io.ByteBufferPool
> 00:23:55.094 [ERROR] -> [Help 1] {code}
> * 1.20 compile_cron_hadoop313 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=59012=logs=87489130-75dc-54e4-1f45-80c30aa367a3=73da6d75-f30d-5d5a-acbe-487a9dcff678=3630



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35002) GitHub action request timeout to ArtifactService

2024-04-22 Thread Ryan Skraba (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839631#comment-17839631
 ] 

Ryan Skraba commented on FLINK-35002:
-

1.20 Java 21 / Compile 
https://github.com/apache/flink/commit/a4c71c8d021f5c07c81e69369139d4455da475ca/checks/24082452512/logs
1.20 AdaptiveScheduler / Test (module: python) 
https://github.com/apache/flink/commit/a4c71c8d021f5c07c81e69369139d4455da475ca/checks/24082689366/logs


> GitHub action request timeout  to ArtifactService
> -
>
> Key: FLINK-35002
> URL: https://issues.apache.org/jira/browse/FLINK-35002
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Reporter: Ryan Skraba
>Priority: Major
>  Labels: github-actions, test-stability
>
> A timeout can occur when uploading a successfully built artifact:
>  * [https://github.com/apache/flink/actions/runs/8516411871/job/23325392650]
> {code:java}
> 2024-04-02T02:20:15.6355368Z With the provided path, there will be 1 file 
> uploaded
> 2024-04-02T02:20:15.6360133Z Artifact name is valid!
> 2024-04-02T02:20:15.6362872Z Root directory input is valid!
> 2024-04-02T02:20:20.6975036Z Attempt 1 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 3000 ms...
> 2024-04-02T02:20:28.7084937Z Attempt 2 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 4785 ms...
> 2024-04-02T02:20:38.5015936Z Attempt 3 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 7375 ms...
> 2024-04-02T02:20:50.8901508Z Attempt 4 of 5 failed with error: Request 
> timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. 
> Retrying request in 14988 ms...
> 2024-04-02T02:21:10.9028438Z ##[error]Failed to CreateArtifact: Failed to 
> make request after 5 attempts: Request timeout: 
> /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact
> 2024-04-02T02:22:59.9893296Z Post job cleanup.
> 2024-04-02T02:22:59.9958844Z Post job cleanup. {code}
> (This is unlikely to be something we can fix, but we can track it.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   >