[jira] [Commented] (FLINK-34582) release build tools lost the newly added py3.11 packages for mac
[ https://issues.apache.org/jira/browse/FLINK-34582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849185#comment-17849185 ] Matthias Pohl commented on FLINK-34582: --- You're checking [~hxb]'s fork where the {{master}} branch doesn't seem to be up-to-date. [apache/flink:flink-python/dev/build-wheels.sh|https://github.com/apache/flink/blob/master/flink-python/dev/build-wheels.sh#L19-L26] does, indeed, have 3.11 added to the python version list. > release build tools lost the newly added py3.11 packages for mac > > > Key: FLINK-34582 > URL: https://issues.apache.org/jira/browse/FLINK-34582 > Project: Flink > Issue Type: Bug >Affects Versions: 1.19.0, 1.20.0 >Reporter: lincoln lee >Assignee: Xingbo Huang >Priority: Blocker > Labels: pull-request-available > Fix For: 1.19.0, 1.20.0 > > Attachments: image-2024-03-07-10-39-49-341.png > > > during 1.19.0-rc1 building binaries via > tools/releasing/create_binary_release.sh > lost the newly added py3.11 2 packages for mac -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34672) HA deadlock between JobMasterServiceLeadershipRunner and DefaultLeaderElectionService
[ https://issues.apache.org/jira/browse/FLINK-34672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848648#comment-17848648 ] Matthias Pohl commented on FLINK-34672: --- I'm still trying to find a reviewer. It's on my plate. But it's not a blocker because the issue already existed in older versions of Flink: {quote} I also verified that this is not something that was introduced in Flink 1.18 with the FLIP-285 changes. AFAIS, it can also happen in 1.17- (I didn't check the pre-FLINK-24038 code but only looked into release-1.17). {quote} > HA deadlock between JobMasterServiceLeadershipRunner and > DefaultLeaderElectionService > - > > Key: FLINK-34672 > URL: https://issues.apache.org/jira/browse/FLINK-34672 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0 >Reporter: Chesnay Schepler >Assignee: Matthias Pohl >Priority: Major > Labels: pull-request-available > Fix For: 1.18.2, 1.20.0, 1.19.1 > > > We recently observed a deadlock in the JM within the HA system. > (see below for the thread dump) > [~mapohl] and I looked a bit into it and there appears to be a race condition > when leadership is revoked while a JobMaster is being started. > It appears to be caused by > {{JobMasterServiceLeadershipRunner#createNewJobMasterServiceProcess}} > forwarding futures while holding a lock; depending on whether the forwarded > future is already complete the next stage may or may not run while holding > that same lock. > We haven't determined yet whether we should be holding that lock or not. > {code} > "DefaultLeaderElectionService-leadershipOperationExecutor-thread-1" #131 > daemon prio=5 os_prio=0 cpu=157.44ms elapsed=78749.65s tid=0x7f531f43d000 > nid=0x19d waiting for monitor entry [0x7f53084fd000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.runIfStateRunning(JobMasterServiceLeadershipRunner.java:462) > - waiting to lock <0xf1c0e088> (a java.lang.Object) > at > org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.revokeLeadership(JobMasterServiceLeadershipRunner.java:397) > at > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.notifyLeaderContenderOfLeadershipLoss(DefaultLeaderElectionService.java:484) > at > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1252/0x000840ddec40.accept(Unknown > Source) > at java.util.HashMap.forEach(java.base@11.0.22/HashMap.java:1337) > at > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.onRevokeLeadershipInternal(DefaultLeaderElectionService.java:452) > at > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1251/0x000840dcf840.run(Unknown > Source) > at > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.lambda$runInLeaderEventThread$3(DefaultLeaderElectionService.java:549) > - locked <0xf0e3f4d8> (a java.lang.Object) > at > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1075/0x000840c23040.run(Unknown > Source) > at > java.util.concurrent.CompletableFuture$AsyncRun.run(java.base@11.0.22/CompletableFuture.java:1736) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.22/ThreadPoolExecutor.java:1128) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.22/ThreadPoolExecutor.java:628) > at java.lang.Thread.run(java.base@11.0.22/Thread.java:829) > {code} > {code} > "jobmanager-io-thread-1" #636 daemon prio=5 os_prio=0 cpu=125.56ms > elapsed=78699.01s tid=0x7f5321c6e800 nid=0x396 waiting for monitor entry > [0x7f530567d000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.hasLeadership(DefaultLeaderElectionService.java:366) > - waiting to lock <0xf0e3f4d8> (a java.lang.Object) > at > org.apache.flink.runtime.leaderelection.DefaultLeaderElection.hasLeadership(DefaultLeaderElection.java:52) > at > org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.isValidLeader(JobMasterServiceLeadershipRunner.java:509) > at > org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.lambda$forwardIfValidLeader$15(JobMasterServiceLeadershipRunner.java:520) > - locked <0xf1c0e088> (a java.lang.Object) > at >
[jira] [Assigned] (FLINK-20402) Migrate test_tpch.sh
[ https://issues.apache.org/jira/browse/FLINK-20402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl reassigned FLINK-20402: - Assignee: Muhammet Orazov > Migrate test_tpch.sh > > > Key: FLINK-20402 > URL: https://issues.apache.org/jira/browse/FLINK-20402 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Ecosystem, Tests >Reporter: Jark Wu >Assignee: Muhammet Orazov >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-20392) Migrating bash e2e tests to Java/Docker
[ https://issues.apache.org/jira/browse/FLINK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846924#comment-17846924 ] Matthias Pohl commented on FLINK-20392: --- Sure, sounds reasonable. Feel free to update it. > Migrating bash e2e tests to Java/Docker > --- > > Key: FLINK-20392 > URL: https://issues.apache.org/jira/browse/FLINK-20392 > Project: Flink > Issue Type: Technical Debt > Components: Test Infrastructure, Tests >Reporter: Matthias Pohl >Priority: Minor > Labels: auto-deprioritized-major, auto-deprioritized-minor, > starter > > This Jira issue serves as an umbrella ticket for single e2e test migration > tasks. This should enable us to migrate all bash-based e2e tests step-by-step. > The goal is to utilize the e2e test framework (see > [flink-end-to-end-tests-common|https://github.com/apache/flink/tree/master/flink-end-to-end-tests/flink-end-to-end-tests-common]). > Ideally, the test should use Docker containers as much as possible > disconnect the execution from the environment. A good source to achieve that > is [testcontainers.org|https://www.testcontainers.org/]. > The related ML discussion is [Stop adding new bash-based e2e tests to > Flink|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Stop-adding-new-bash-based-e2e-tests-to-Flink-td46607.html]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-20392) Migrating bash e2e tests to Java/Docker
[ https://issues.apache.org/jira/browse/FLINK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846838#comment-17846838 ] Matthias Pohl commented on FLINK-20392: --- This discussion feels similar to our efforts around migrating to JUnit5 and assertj as the standard JUnit tests. It costed (and is still costing) quite a bit of resources with the risk of missing things when reviewing the tests. That is why I still see value in just keeping both options around. That requires less resources and we're not losing much. The pros and cons are still a good guideline for developers to decide on which technology to use if they are planning to create a new e2e test in Java. WDYT? > Migrating bash e2e tests to Java/Docker > --- > > Key: FLINK-20392 > URL: https://issues.apache.org/jira/browse/FLINK-20392 > Project: Flink > Issue Type: Technical Debt > Components: Test Infrastructure, Tests >Reporter: Matthias Pohl >Priority: Minor > Labels: auto-deprioritized-major, auto-deprioritized-minor, > starter > > This Jira issue serves as an umbrella ticket for single e2e test migration > tasks. This should enable us to migrate all bash-based e2e tests step-by-step. > The goal is to utilize the e2e test framework (see > [flink-end-to-end-tests-common|https://github.com/apache/flink/tree/master/flink-end-to-end-tests/flink-end-to-end-tests-common]). > Ideally, the test should use Docker containers as much as possible > disconnect the execution from the environment. A good source to achieve that > is [testcontainers.org|https://www.testcontainers.org/]. > The related ML discussion is [Stop adding new bash-based e2e tests to > Flink|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Stop-adding-new-bash-based-e2e-tests-to-Flink-td46607.html]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-20392) Migrating bash e2e tests to Java/Docker
[ https://issues.apache.org/jira/browse/FLINK-20392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846540#comment-17846540 ] Matthias Pohl commented on FLINK-20392: --- Thanks for the write-up. I'm just wondering whether we gain anything from only allowing one of the two approaches. What about allowing both options? > Migrating bash e2e tests to Java/Docker > --- > > Key: FLINK-20392 > URL: https://issues.apache.org/jira/browse/FLINK-20392 > Project: Flink > Issue Type: Technical Debt > Components: Test Infrastructure, Tests >Reporter: Matthias Pohl >Priority: Minor > Labels: auto-deprioritized-major, auto-deprioritized-minor, > starter > > This Jira issue serves as an umbrella ticket for single e2e test migration > tasks. This should enable us to migrate all bash-based e2e tests step-by-step. > The goal is to utilize the e2e test framework (see > [flink-end-to-end-tests-common|https://github.com/apache/flink/tree/master/flink-end-to-end-tests/flink-end-to-end-tests-common]). > Ideally, the test should use Docker containers as much as possible > disconnect the execution from the environment. A good source to achieve that > is [testcontainers.org|https://www.testcontainers.org/]. > The related ML discussion is [Stop adding new bash-based e2e tests to > Flink|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Stop-adding-new-bash-based-e2e-tests-to-Flink-td46607.html]. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-34324) s3_setup is called in test_file_sink.sh even if the common_s3.sh is not sourced
[ https://issues.apache.org/jira/browse/FLINK-34324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845238#comment-17845238 ] Matthias Pohl edited comment on FLINK-34324 at 5/10/24 8:07 AM: * master: [93526c2f3247598ce80854cf65dd4440eb5aaa43|https://github.com/apache/flink/commit/93526c2f3247598ce80854cf65dd4440eb5aaa43] * 1.19: [8707c63ee147085671a9ae1b294854bac03fc914|https://github.com/apache/flink/commit/8707c63ee147085671a9ae1b294854bac03fc914] * 1.18: [7d98ab060be82fe3684d15501b9eb83373303d18|https://github.com/apache/flink/commit/7d98ab060be82fe3684d15501b9eb83373303d18] was (Author: mapohl): * master ** [93526c2f3247598ce80854cf65dd4440eb5aaa43|https://github.com/apache/flink/commit/93526c2f3247598ce80854cf65dd4440eb5aaa43] * 1.19 ** [8707c63ee147085671a9ae1b294854bac03fc914|https://github.com/apache/flink/commit/8707c63ee147085671a9ae1b294854bac03fc914] * 1.18 ** [7d98ab060be82fe3684d15501b9eb83373303d18|https://github.com/apache/flink/commit/7d98ab060be82fe3684d15501b9eb83373303d18] > s3_setup is called in test_file_sink.sh even if the common_s3.sh is not > sourced > --- > > Key: FLINK-34324 > URL: https://issues.apache.org/jira/browse/FLINK-34324 > Project: Flink > Issue Type: Bug > Components: Connectors / Hadoop Compatibility, Tests >Affects Versions: 1.17.2, 1.19.0, 1.18.1 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Major > Labels: pull-request-available, test-stability > Fix For: 1.18.2, 1.20.0, 1.19.1 > > > See example CI run from the FLINK-34150 PR: > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56570=logs=af184cdd-c6d8-5084-0b69-7e9c67b35f7a=0f3adb59-eefa-51c6-2858-3654d9e0749d=3191 > {code} > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_file_sink.sh: > line 38: s3_setup: command not found > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-34324) s3_setup is called in test_file_sink.sh even if the common_s3.sh is not sourced
[ https://issues.apache.org/jira/browse/FLINK-34324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl resolved FLINK-34324. --- Fix Version/s: 1.18.2 1.20.0 1.19.1 Resolution: Fixed * master ** [93526c2f3247598ce80854cf65dd4440eb5aaa43|https://github.com/apache/flink/commit/93526c2f3247598ce80854cf65dd4440eb5aaa43] * 1.19 ** [8707c63ee147085671a9ae1b294854bac03fc914|https://github.com/apache/flink/commit/8707c63ee147085671a9ae1b294854bac03fc914] * 1.18 ** [7d98ab060be82fe3684d15501b9eb83373303d18|https://github.com/apache/flink/commit/7d98ab060be82fe3684d15501b9eb83373303d18] > s3_setup is called in test_file_sink.sh even if the common_s3.sh is not > sourced > --- > > Key: FLINK-34324 > URL: https://issues.apache.org/jira/browse/FLINK-34324 > Project: Flink > Issue Type: Bug > Components: Connectors / Hadoop Compatibility, Tests >Affects Versions: 1.17.2, 1.19.0, 1.18.1 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Major > Labels: pull-request-available, test-stability > Fix For: 1.18.2, 1.20.0, 1.19.1 > > > See example CI run from the FLINK-34150 PR: > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=56570=logs=af184cdd-c6d8-5084-0b69-7e9c67b35f7a=0f3adb59-eefa-51c6-2858-3654d9e0749d=3191 > {code} > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_file_sink.sh: > line 38: s3_setup: command not found > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-34937) Apache Infra GHA policy update
[ https://issues.apache.org/jira/browse/FLINK-34937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl reassigned FLINK-34937: - Assignee: Matthias Pohl > Apache Infra GHA policy update > -- > > Key: FLINK-34937 > URL: https://issues.apache.org/jira/browse/FLINK-34937 > Project: Flink > Issue Type: Sub-task > Components: Build System / CI >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Major > Labels: pull-request-available > > There is a policy update [announced in the infra > ML|https://www.mail-archive.com/jdo-dev@db.apache.org/msg13638.html] which > asked Apache projects to limit the number of runners per job. Additionally, > the [GHA policy|https://infra.apache.org/github-actions-policy.html] is > referenced which I wasn't aware of when working on the action workflow. > This issue is about applying the policy to the Flink GHA workflows. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34989) Apache Infra requests to reduce the runner usage for a project
[ https://issues.apache.org/jira/browse/FLINK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833910#comment-17833910 ] Matthias Pohl commented on FLINK-34989: --- [~martijnvisser] pointed out that we might need to fix this in the connector repos as well. > Apache Infra requests to reduce the runner usage for a project > -- > > Key: FLINK-34989 > URL: https://issues.apache.org/jira/browse/FLINK-34989 > Project: Flink > Issue Type: Sub-task > Components: Build System / CI >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Major > Labels: pull-request-available > > The GitHub Actions CI utilizes runners that are hosted by Apache Infra right > now. These runners are limited. The runner usage can be monitored via the > following links: > * [Flink-specific > report|https://infra-reports.apache.org/#ghactions=flink=168] > (needs ASF committer rights) This project-specific report can only be > modified through the HTTP GET parameters of the URL. > * [Global report|https://infra-reports.apache.org/#ghactions] (needs ASF > membership) > There was a policy change announced recently: > {quote} > Policy change on use of GitHub Actions > Due to misconfigurations in their builds, some projects have been using > unsupportable numbers of GitHub Actions. As part of fixing this situation, > Infra has added a 'resource use' section to the policy on GitHub Actions. > This section of the policy will come into effect on April 20, 2024: > All workflows MUST have a job concurrency level less than or equal to 20. > This means a workflow cannot have more than 20 jobs running at the same time > across all matrices. > All workflows SHOULD have a job concurrency level less than or equal to 15. > Just because 20 is the max, doesn't mean you should strive for 20. > The average number of minutes a project uses per calendar week MUST NOT > exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 > hours). > The average number of minutes a project uses in any consecutive five-day > period MUST NOT exceed the equivalent of 30 full-time runners (216,000 > minutes, or 3,600 hours). > Projects whose builds consistently cross the maximum use limits will lose > their access to GitHub Actions until they fix their build configurations. > The full policy is at > https://infra.apache.org/github-actions-policy.html. > {quote} > Currently (last week of March 2024) Flink was ranked at #19 of projects that > used the Apache Infra runner resources the most which doesn't seem too bad. > This contained not only Apache Flink but also the Kubernetes operator, > connectors and other resources. According to [this > source|https://infra.apache.org/github-actions-secrets.html] Apache Infra > manages 180 runners right now. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-34999) PR CI stopped operating
[ https://issues.apache.org/jira/browse/FLINK-34999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl resolved FLINK-34999. --- Resolution: Fixed Thanks for working on it. I verified that [PR CI|https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=2] is picked up again. (y) > PR CI stopped operating > --- > > Key: FLINK-34999 > URL: https://issues.apache.org/jira/browse/FLINK-34999 > Project: Flink > Issue Type: Bug > Components: Build System / CI >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Blocker > > There are no [new PR CI > runs|https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=2] > being picked up anymore. [Recently updated > PRs|https://github.com/apache/flink/pulls?q=sort%3Aupdated-desc] are not > picked up by the @flinkbot. > In the meantime there was a notification sent from GitHub that the password > of the [@flinkbot|https://github.com/flinkbot] was reset for security > reasons. It's quite likely that these two events are related. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-35005) SqlClientITCase Failed to build JobManager image
[ https://issues.apache.org/jira/browse/FLINK-35005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-35005: -- Component/s: Test Infrastructure > SqlClientITCase Failed to build JobManager image > > > Key: FLINK-35005 > URL: https://issues.apache.org/jira/browse/FLINK-35005 > Project: Flink > Issue Type: Bug > Components: Test Infrastructure >Affects Versions: 1.20.0 >Reporter: Ryan Skraba >Priority: Critical > Labels: test-stability > > jdk21 > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58708=logs=dc1bf4ed-4646-531a-f094-e103042be549=fb3d654d-52f8-5b98-fe9d-b18dd2e2b790=15140 > {code} > Apr 03 02:59:16 02:59:16.247 [INFO] > --- > Apr 03 02:59:16 02:59:16.248 [INFO] T E S T S > Apr 03 02:59:16 02:59:16.248 [INFO] > --- > Apr 03 02:59:17 02:59:17.841 [INFO] Running SqlClientITCase > Apr 03 03:03:15 at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1312) > Apr 03 03:03:15 at > java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1843) > Apr 03 03:03:15 at > java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1808) > Apr 03 03:03:15 at > java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188) > Apr 03 03:03:15 Caused by: > org.apache.flink.connector.testframe.container.ImageBuildException: Failed to > build image "flink-configured-jobmanager" > Apr 03 03:03:15 at > org.apache.flink.connector.testframe.container.FlinkImageBuilder.build(FlinkImageBuilder.java:234) > Apr 03 03:03:15 at > org.apache.flink.connector.testframe.container.FlinkTestcontainersConfigurator.configureJobManagerContainer(FlinkTestcontainersConfigurator.java:65) > Apr 03 03:03:15 ... 12 more > Apr 03 03:03:15 Caused by: java.lang.RuntimeException: > com.github.dockerjava.api.exception.DockerClientException: Could not build > image: Head > "https://registry-1.docker.io/v2/library/eclipse-temurin/manifests/21-jre-jammy": > received unexpected HTTP status: 500 Internal Server Error > Apr 03 03:03:15 at > org.rnorth.ducttape.timeouts.Timeouts.callFuture(Timeouts.java:68) > Apr 03 03:03:15 at > org.rnorth.ducttape.timeouts.Timeouts.getWithTimeout(Timeouts.java:43) > Apr 03 03:03:15 at > org.testcontainers.utility.LazyFuture.get(LazyFuture.java:47) > Apr 03 03:03:15 at > org.apache.flink.connector.testframe.container.FlinkImageBuilder.buildBaseImage(FlinkImageBuilder.java:255) > Apr 03 03:03:15 at > org.apache.flink.connector.testframe.container.FlinkImageBuilder.build(FlinkImageBuilder.java:206) > Apr 03 03:03:15 ... 13 more > Apr 03 03:03:15 Caused by: > com.github.dockerjava.api.exception.DockerClientException: Could not build > image: Head > "https://registry-1.docker.io/v2/library/eclipse-temurin/manifests/21-jre-jammy": > received unexpected HTTP status: 500 Internal Server Error > Apr 03 03:03:15 at > com.github.dockerjava.api.command.BuildImageResultCallback.getImageId(BuildImageResultCallback.java:78) > Apr 03 03:03:15 at > com.github.dockerjava.api.command.BuildImageResultCallback.awaitImageId(BuildImageResultCallback.java:50) > Apr 03 03:03:15 at > org.testcontainers.images.builder.ImageFromDockerfile.resolve(ImageFromDockerfile.java:159) > Apr 03 03:03:15 at > org.testcontainers.images.builder.ImageFromDockerfile.resolve(ImageFromDockerfile.java:40) > Apr 03 03:03:15 at > org.testcontainers.utility.LazyFuture.getResolvedValue(LazyFuture.java:19) > Apr 03 03:03:15 at > org.testcontainers.utility.LazyFuture.get(LazyFuture.java:41) > Apr 03 03:03:15 at > java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) > Apr 03 03:03:15 at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) > Apr 03 03:03:15 at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) > Apr 03 03:03:15 at java.base/java.lang.Thread.run(Thread.java:1583) > Apr 03 03:03:15 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-35004) SqlGatewayE2ECase could not start container
[ https://issues.apache.org/jira/browse/FLINK-35004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-35004: -- Component/s: Test Infrastructure > SqlGatewayE2ECase could not start container > --- > > Key: FLINK-35004 > URL: https://issues.apache.org/jira/browse/FLINK-35004 > Project: Flink > Issue Type: Bug > Components: Test Infrastructure >Affects Versions: 1.20.0 >Reporter: Ryan Skraba >Priority: Critical > Labels: github-actions, test-stability > > 1.20, jdk17: > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58708=logs=e8e46ef5-75cc-564f-c2bd-1797c35cbebe=60c49903-2505-5c25-7e46-de91b1737bea=15078 > There is an error: "Process failed due to timeout" in > {{SqlGatewayE2ECase.testSqlClientExecuteStatement}}. In the maven logs, we > can see: > {code:java} > 02:57:26,979 [main] INFO tc.prestodb/hdp2.6-hive:10 > [] - Image prestodb/hdp2.6-hive:10 pull took > PT43.59218S > 02:57:26,991 [main] INFO tc.prestodb/hdp2.6-hive:10 > [] - Creating container for image: > prestodb/hdp2.6-hive:10 > 02:57:27,032 [main] INFO tc.prestodb/hdp2.6-hive:10 > [] - Container prestodb/hdp2.6-hive:10 is starting: > 162069678c7d03252a42ed81ca43e1911ca7357c476a4a5de294ffe55bd83145 > 02:57:42,846 [main] INFO tc.prestodb/hdp2.6-hive:10 > [] - Container prestodb/hdp2.6-hive:10 started in > PT15.855339866S > 02:57:53,447 [main] ERROR tc.prestodb/hdp2.6-hive:10 > [] - Could not start container > java.lang.RuntimeException: java.net.SocketTimeoutException: timeout > at > org.apache.flink.table.gateway.containers.HiveContainer.containerIsStarted(HiveContainer.java:94) > ~[test-classes/:?] > at > org.testcontainers.containers.GenericContainer.containerIsStarted(GenericContainer.java:723) > ~[testcontainers-1.19.1.jar:1.19.1] > at > org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:543) > ~[testcontainers-1.19.1.jar:1.19.1] > at > org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:354) > ~[testcontainers-1.19.1.jar:1.19.1] > at > org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81) > ~[duct-tape-1.0.8.jar:?] > at > org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:344) > ~[testcontainers-1.19.1.jar:1.19.1] > at > org.apache.flink.table.gateway.containers.HiveContainer.doStart(HiveContainer.java:69) > ~[test-classes/:?] > at > org.testcontainers.containers.GenericContainer.start(GenericContainer.java:334) > ~[testcontainers-1.19.1.jar:1.19.1] > at > org.testcontainers.containers.GenericContainer.starting(GenericContainer.java:1144) > ~[testcontainers-1.19.1.jar:1.19.1] > at > org.testcontainers.containers.FailureDetectingExternalResource$1.evaluate(FailureDetectingExternalResource.java:28) > ~[testcontainers-1.19.1.jar:1.19.1] > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > ~[junit-4.13.2.jar:4.13.2] > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > ~[junit-4.13.2.jar:4.13.2] > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > ~[junit-4.13.2.jar:4.13.2] > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > ~[junit-4.13.2.jar:4.13.2] > at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > ~[junit-4.13.2.jar:4.13.2] > at > org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42) > ~[junit-vintage-engine-5.10.1.jar:5.10.1] > at > org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80) > ~[junit-vintage-engine-5.10.1.jar:5.10.1] > at > org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:72) > ~[junit-vintage-engine-5.10.1.jar:5.10.1] > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:198) > ~[junit-platform-launcher-1.10.1.jar:1.10.1] > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:169) > ~[junit-platform-launcher-1.10.1.jar:1.10.1] > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:93) > ~[junit-platform-launcher-1.10.1.jar:1.10.1] > at > org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:58) > ~[junit-platform-launcher-1.10.1.jar:1.10.1] > at >
[jira] [Resolved] (FLINK-35000) PullRequest template doesn't use the correct format to refer to the testing code convention
[ https://issues.apache.org/jira/browse/FLINK-35000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl resolved FLINK-35000. --- Fix Version/s: 1.18.2 1.20.0 1.19.1 Resolution: Fixed master: [d301839dfe2ed9b1313d23f8307bda76868a0c0a|https://github.com/apache/flink/commit/d301839dfe2ed9b1313d23f8307bda76868a0c0a] 1.19: [eb58599b434b6c5fe86f6e487ce88315c98b4ec3|https://github.com/apache/flink/commit/eb58599b434b6c5fe86f6e487ce88315c98b4ec3] 1.18: [9150f93b18b8694646092a6ed24a14e3653f613f|https://github.com/apache/flink/commit/9150f93b18b8694646092a6ed24a14e3653f613f] > PullRequest template doesn't use the correct format to refer to the testing > code convention > --- > > Key: FLINK-35000 > URL: https://issues.apache.org/jira/browse/FLINK-35000 > Project: Flink > Issue Type: Bug > Components: Build System / CI, Project Website >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Minor > Labels: pull-request-available > Fix For: 1.18.2, 1.20.0, 1.19.1 > > > The PR template refers to > https://flink.apache.org/contributing/code-style-and-quality-common.html#testing > rather than > https://flink.apache.org/how-to-contribute/code-style-and-quality-common/#7-testing -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-35002) GitHub action/upload-artifact@v4 can timeout
[ https://issues.apache.org/jira/browse/FLINK-35002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-35002: -- Labels: github-actions test-stability (was: test-stability) > GitHub action/upload-artifact@v4 can timeout > > > Key: FLINK-35002 > URL: https://issues.apache.org/jira/browse/FLINK-35002 > Project: Flink > Issue Type: Bug > Components: Build System >Reporter: Ryan Skraba >Priority: Major > Labels: github-actions, test-stability > > A timeout can occur when uploading a successfully built artifact: > * [https://github.com/apache/flink/actions/runs/8516411871/job/23325392650] > {code:java} > 2024-04-02T02:20:15.6355368Z With the provided path, there will be 1 file > uploaded > 2024-04-02T02:20:15.6360133Z Artifact name is valid! > 2024-04-02T02:20:15.6362872Z Root directory input is valid! > 2024-04-02T02:20:20.6975036Z Attempt 1 of 5 failed with error: Request > timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. > Retrying request in 3000 ms... > 2024-04-02T02:20:28.7084937Z Attempt 2 of 5 failed with error: Request > timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. > Retrying request in 4785 ms... > 2024-04-02T02:20:38.5015936Z Attempt 3 of 5 failed with error: Request > timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. > Retrying request in 7375 ms... > 2024-04-02T02:20:50.8901508Z Attempt 4 of 5 failed with error: Request > timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact. > Retrying request in 14988 ms... > 2024-04-02T02:21:10.9028438Z ##[error]Failed to CreateArtifact: Failed to > make request after 5 attempts: Request timeout: > /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact > 2024-04-02T02:22:59.9893296Z Post job cleanup. > 2024-04-02T02:22:59.9958844Z Post job cleanup. {code} > (This is unlikely to be something we can fix, but we can track it.) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34999) PR CI stopped operating
[ https://issues.apache.org/jira/browse/FLINK-34999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34999: -- Description: There are no [new PR CI runs|https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=2] being picked up anymore. [Recently updated PRs|https://github.com/apache/flink/pulls?q=sort%3Aupdated-desc] are not picked up by the @flinkbot. In the meantime there was a notification sent from GitHub that the password of the [@flinkbot|https://github.com/flinkbot] was reset for security reasons. It's quite likely that these two events are related. was: There are no [new PR CI runs|https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=2] being picked up anymore. [Recently updated PRs|https://github.com/apache/flink/pulls?q=sort%3Aupdated-desc] are not picked up by the @flinkbot. In the meantime there was a notification sent from GitHub that the password of the @flinkbot was reset for security reasons. It's quite likely that these two events are related. > PR CI stopped operating > --- > > Key: FLINK-34999 > URL: https://issues.apache.org/jira/browse/FLINK-34999 > Project: Flink > Issue Type: Bug > Components: Build System / CI >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Blocker > > There are no [new PR CI > runs|https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=2] > being picked up anymore. [Recently updated > PRs|https://github.com/apache/flink/pulls?q=sort%3Aupdated-desc] are not > picked up by the @flinkbot. > In the meantime there was a notification sent from GitHub that the password > of the [@flinkbot|https://github.com/flinkbot] was reset for security > reasons. It's quite likely that these two events are related. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35000) PullRequest template doesn't use the correct format to refer to the testing code convention
Matthias Pohl created FLINK-35000: - Summary: PullRequest template doesn't use the correct format to refer to the testing code convention Key: FLINK-35000 URL: https://issues.apache.org/jira/browse/FLINK-35000 Project: Flink Issue Type: Bug Components: Build System / CI, Project Website Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl The PR template refers to https://flink.apache.org/contributing/code-style-and-quality-common.html#testing rather than https://flink.apache.org/how-to-contribute/code-style-and-quality-common/#7-testing -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-35000) PullRequest template doesn't use the correct format to refer to the testing code convention
[ https://issues.apache.org/jira/browse/FLINK-35000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl reassigned FLINK-35000: - Assignee: Matthias Pohl > PullRequest template doesn't use the correct format to refer to the testing > code convention > --- > > Key: FLINK-35000 > URL: https://issues.apache.org/jira/browse/FLINK-35000 > Project: Flink > Issue Type: Bug > Components: Build System / CI, Project Website >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Minor > > The PR template refers to > https://flink.apache.org/contributing/code-style-and-quality-common.html#testing > rather than > https://flink.apache.org/how-to-contribute/code-style-and-quality-common/#7-testing -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34999) PR CI stopped operating
[ https://issues.apache.org/jira/browse/FLINK-34999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833523#comment-17833523 ] Matthias Pohl commented on FLINK-34999: --- CC [~uce] [~Weijie Guo] [~fanrui] [~rmetzger] CC [~jingge] since it might be Ververica infrastructure-related > PR CI stopped operating > --- > > Key: FLINK-34999 > URL: https://issues.apache.org/jira/browse/FLINK-34999 > Project: Flink > Issue Type: Bug > Components: Build System / CI >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Blocker > > There are no [new PR CI > runs|https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=2] > being picked up anymore. [Recently updated > PRs|https://github.com/apache/flink/pulls?q=sort%3Aupdated-desc] are not > picked up by the @flinkbot. > In the meantime there was a notification sent from GitHub that the password > of the @flinkbot was reset for security reasons. It's quite likely that these > two events are related. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34999) PR CI stopped operating
Matthias Pohl created FLINK-34999: - Summary: PR CI stopped operating Key: FLINK-34999 URL: https://issues.apache.org/jira/browse/FLINK-34999 Project: Flink Issue Type: Bug Components: Build System / CI Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl There are no [new PR CI runs|https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=2] being picked up anymore. [Recently updated PRs|https://github.com/apache/flink/pulls?q=sort%3Aupdated-desc] are not picked up by the @flinkbot. In the meantime there was a notification sent from GitHub that the password of the @flinkbot was reset for security reasons. It's quite likely that these two events are related. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34997) PyFlink YARN per-job on Docker test failed on azure
[ https://issues.apache.org/jira/browse/FLINK-34997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833505#comment-17833505 ] Matthias Pohl commented on FLINK-34997: --- The issue seems to be that {{docker-compose}} binaries are missing in the Azure VMs. > PyFlink YARN per-job on Docker test failed on azure > --- > > Key: FLINK-34997 > URL: https://issues.apache.org/jira/browse/FLINK-34997 > Project: Flink > Issue Type: Bug > Components: Build System / CI >Affects Versions: 1.20.0 >Reporter: Weijie Guo >Priority: Blocker > Labels: test-stability > > {code} > Apr 03 03:12:37 > == > Apr 03 03:12:37 Running 'PyFlink YARN per-job on Docker test' > Apr 03 03:12:37 > == > Apr 03 03:12:37 TEST_DATA_DIR: > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-37046085202 > Apr 03 03:12:37 Flink dist directory: > /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT > Apr 03 03:12:38 Flink dist directory: > /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT > Apr 03 03:12:38 Docker version 24.0.9, build 2936816 > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/common_docker.sh: > line 24: docker-compose: command not found > Apr 03 03:12:38 [FAIL] Test script contains errors. > Apr 03 03:12:38 Checking of logs skipped. > Apr 03 03:12:38 > Apr 03 03:12:38 [FAIL] 'PyFlink YARN per-job on Docker test' failed after 0 > minutes and 1 seconds! Test exited with exit code 1 > {code} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58709=logs=f8e16326-dc75-5ba0-3e95-6178dd55bf6c=94ccd692-49fc-5c64-8775-d427c6e65440=10226 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34997) PyFlink YARN per-job on Docker test failed on azure
[ https://issues.apache.org/jira/browse/FLINK-34997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34997: -- Labels: test-stability (was: ) > PyFlink YARN per-job on Docker test failed on azure > --- > > Key: FLINK-34997 > URL: https://issues.apache.org/jira/browse/FLINK-34997 > Project: Flink > Issue Type: Bug > Components: Build System / CI >Affects Versions: 1.20.0 >Reporter: Weijie Guo >Priority: Major > Labels: test-stability > > {code} > Apr 03 03:12:37 > == > Apr 03 03:12:37 Running 'PyFlink YARN per-job on Docker test' > Apr 03 03:12:37 > == > Apr 03 03:12:37 TEST_DATA_DIR: > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-37046085202 > Apr 03 03:12:37 Flink dist directory: > /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT > Apr 03 03:12:38 Flink dist directory: > /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT > Apr 03 03:12:38 Docker version 24.0.9, build 2936816 > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/common_docker.sh: > line 24: docker-compose: command not found > Apr 03 03:12:38 [FAIL] Test script contains errors. > Apr 03 03:12:38 Checking of logs skipped. > Apr 03 03:12:38 > Apr 03 03:12:38 [FAIL] 'PyFlink YARN per-job on Docker test' failed after 0 > minutes and 1 seconds! Test exited with exit code 1 > {code} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58709=logs=f8e16326-dc75-5ba0-3e95-6178dd55bf6c=94ccd692-49fc-5c64-8775-d427c6e65440=10226 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34998) Wordcount on Docker test failed on azure
[ https://issues.apache.org/jira/browse/FLINK-34998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833504#comment-17833504 ] Matthias Pohl commented on FLINK-34998: --- I guess, this one is a duplicate of FLINK-34997. In the end, the error happens due to the missing {{docker-compose}} binaries in the Azure VMs. WDYT? > Wordcount on Docker test failed on azure > > > Key: FLINK-34998 > URL: https://issues.apache.org/jira/browse/FLINK-34998 > Project: Flink > Issue Type: Bug > Components: Build System / CI >Affects Versions: 1.20.0 >Reporter: Weijie Guo >Priority: Major > > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_docker_embedded_job.sh: > line 65: docker-compose: command not found > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_docker_embedded_job.sh: > line 66: docker-compose: command not found > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_docker_embedded_job.sh: > line 67: docker-compose: command not found > sort: cannot read: > '/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-24250435151/out/docker_wc_out*': > No such file or directory > Apr 03 02:08:14 FAIL WordCount: Output hash mismatch. Got > d41d8cd98f00b204e9800998ecf8427e, expected 0e5bd0a3dd7d5a7110aa85ff70adb54b. > Apr 03 02:08:14 head hexdump of actual: > head: cannot open > '/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-24250435151/out/docker_wc_out*' > for reading: No such file or directory > Apr 03 02:08:14 Stopping job timeout watchdog (with pid=244913) > Apr 03 02:08:14 [FAIL] Test script contains errors. > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58709=logs=e9d3d34f-3d15-59f4-0e3e-35067d100dfe=5d91035e-8022-55f2-2d4f-ab121508bf7e=6043 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34997) PyFlink YARN per-job on Docker test failed on azure
[ https://issues.apache.org/jira/browse/FLINK-34997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34997: -- Description: {code} Apr 03 03:12:37 == Apr 03 03:12:37 Running 'PyFlink YARN per-job on Docker test' Apr 03 03:12:37 == Apr 03 03:12:37 TEST_DATA_DIR: /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-37046085202 Apr 03 03:12:37 Flink dist directory: /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT Apr 03 03:12:38 Flink dist directory: /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT Apr 03 03:12:38 Docker version 24.0.9, build 2936816 /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/common_docker.sh: line 24: docker-compose: command not found Apr 03 03:12:38 [FAIL] Test script contains errors. Apr 03 03:12:38 Checking of logs skipped. Apr 03 03:12:38 Apr 03 03:12:38 [FAIL] 'PyFlink YARN per-job on Docker test' failed after 0 minutes and 1 seconds! Test exited with exit code 1 {code} https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58709=logs=f8e16326-dc75-5ba0-3e95-6178dd55bf6c=94ccd692-49fc-5c64-8775-d427c6e65440=10226 was: Apr 03 03:12:37 == Apr 03 03:12:37 Running 'PyFlink YARN per-job on Docker test' Apr 03 03:12:37 == Apr 03 03:12:37 TEST_DATA_DIR: /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-37046085202 Apr 03 03:12:37 Flink dist directory: /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT Apr 03 03:12:38 Flink dist directory: /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT Apr 03 03:12:38 Docker version 24.0.9, build 2936816 /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/common_docker.sh: line 24: docker-compose: command not found Apr 03 03:12:38 [FAIL] Test script contains errors. Apr 03 03:12:38 Checking of logs skipped. Apr 03 03:12:38 Apr 03 03:12:38 [FAIL] 'PyFlink YARN per-job on Docker test' failed after 0 minutes and 1 seconds! Test exited with exit code 1 https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58709=logs=f8e16326-dc75-5ba0-3e95-6178dd55bf6c=94ccd692-49fc-5c64-8775-d427c6e65440=10226 > PyFlink YARN per-job on Docker test failed on azure > --- > > Key: FLINK-34997 > URL: https://issues.apache.org/jira/browse/FLINK-34997 > Project: Flink > Issue Type: Bug > Components: Build System / CI >Affects Versions: 1.20.0 >Reporter: Weijie Guo >Priority: Major > > {code} > Apr 03 03:12:37 > == > Apr 03 03:12:37 Running 'PyFlink YARN per-job on Docker test' > Apr 03 03:12:37 > == > Apr 03 03:12:37 TEST_DATA_DIR: > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-37046085202 > Apr 03 03:12:37 Flink dist directory: > /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT > Apr 03 03:12:38 Flink dist directory: > /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT > Apr 03 03:12:38 Docker version 24.0.9, build 2936816 > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/common_docker.sh: > line 24: docker-compose: command not found > Apr 03 03:12:38 [FAIL] Test script contains errors. > Apr 03 03:12:38 Checking of logs skipped. > Apr 03 03:12:38 > Apr 03 03:12:38 [FAIL] 'PyFlink YARN per-job on Docker test' failed after 0 > minutes and 1 seconds! Test exited with exit code 1 > {code} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58709=logs=f8e16326-dc75-5ba0-3e95-6178dd55bf6c=94ccd692-49fc-5c64-8775-d427c6e65440=10226 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34997) PyFlink YARN per-job on Docker test failed on azure
[ https://issues.apache.org/jira/browse/FLINK-34997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34997: -- Priority: Blocker (was: Major) > PyFlink YARN per-job on Docker test failed on azure > --- > > Key: FLINK-34997 > URL: https://issues.apache.org/jira/browse/FLINK-34997 > Project: Flink > Issue Type: Bug > Components: Build System / CI >Affects Versions: 1.20.0 >Reporter: Weijie Guo >Priority: Blocker > Labels: test-stability > > {code} > Apr 03 03:12:37 > == > Apr 03 03:12:37 Running 'PyFlink YARN per-job on Docker test' > Apr 03 03:12:37 > == > Apr 03 03:12:37 TEST_DATA_DIR: > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-37046085202 > Apr 03 03:12:37 Flink dist directory: > /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT > Apr 03 03:12:38 Flink dist directory: > /home/vsts/work/1/s/flink-dist/target/flink-1.19-SNAPSHOT-bin/flink-1.19-SNAPSHOT > Apr 03 03:12:38 Docker version 24.0.9, build 2936816 > /home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/common_docker.sh: > line 24: docker-compose: command not found > Apr 03 03:12:38 [FAIL] Test script contains errors. > Apr 03 03:12:38 Checking of logs skipped. > Apr 03 03:12:38 > Apr 03 03:12:38 [FAIL] 'PyFlink YARN per-job on Docker test' failed after 0 > minutes and 1 seconds! Test exited with exit code 1 > {code} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58709=logs=f8e16326-dc75-5ba0-3e95-6178dd55bf6c=94ccd692-49fc-5c64-8775-d427c6e65440=10226 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34643) JobIDLoggingITCase failed
[ https://issues.apache.org/jira/browse/FLINK-34643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833414#comment-17833414 ] Matthias Pohl commented on FLINK-34643: --- I guess, reopening the issue would be fine. But for the sake of not putting too much into a single ticket, it wouldn't be wrong to create a new ticket and linking FLINK-34643 as the cause, either. I personally would go for the latter option. > JobIDLoggingITCase failed > - > > Key: FLINK-34643 > URL: https://issues.apache.org/jira/browse/FLINK-34643 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.20.0 >Reporter: Matthias Pohl >Assignee: Roman Khachatryan >Priority: Major > Labels: pull-request-available, test-stability > Fix For: 1.20.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=7897 > {code} > Mar 09 01:24:23 01:24:23.498 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 4.209 s <<< FAILURE! -- in > org.apache.flink.test.misc.JobIDLoggingITCase > Mar 09 01:24:23 01:24:23.498 [ERROR] > org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(ClusterClient) > -- Time elapsed: 1.459 s <<< ERROR! > Mar 09 01:24:23 java.lang.IllegalStateException: Too few log events recorded > for org.apache.flink.runtime.jobmaster.JobMaster (12) - this must be a bug in > the test code > Mar 09 01:24:23 at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:215) > Mar 09 01:24:23 at > org.apache.flink.test.misc.JobIDLoggingITCase.assertJobIDPresent(JobIDLoggingITCase.java:148) > Mar 09 01:24:23 at > org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(JobIDLoggingITCase.java:132) > Mar 09 01:24:23 at java.lang.reflect.Method.invoke(Method.java:498) > Mar 09 01:24:23 at > java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) > Mar 09 01:24:23 > {code} > The other test failures of this build were also caused by the same test: > * > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=2c3cbe13-dee0-5837-cf47-3053da9a8a78=b78d9d30-509a-5cea-1fef-db7abaa325ae=8349 > * > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=8209 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-34989) Apache Infra requests to reduce the runner usage for a project
[ https://issues.apache.org/jira/browse/FLINK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833154#comment-17833154 ] Matthias Pohl edited comment on FLINK-34989 at 4/2/24 12:18 PM: This Jira issue is about adding job concurrency support. Ideally, we should make it configurable in an easy way and set it to a concurrency level >20 as requested by Apache Infra. This affects the nightly builds which run per branch with 5 different test profiles and each test profile having 11 runners (10 stages + a short-running license check) being occupied in parallel. Generally, we should make CI be more selective anyway. Apache Infra constantly criticizes projects for running heavy-load CI on changes like simple doc changes (see [here|https://infra.apache.org/github-actions-secrets.html]). was (Author: mapohl): This Jira issue is about adding job concurrency support. Ideally, we should make it configurable in an easy way and set it to a concurrency level >20 as requested by Apache Infra. This affects the nightly builds which run per branch with 5 different test profiles and each test profile having 11 runners (10 stages + a short-running license check) being occupied in parallel. Generally, we should make CI be more selective anyway. Apache Infra constantly criticizes projects to run heavy-load CI for things like simple doc changes. > Apache Infra requests to reduce the runner usage for a project > -- > > Key: FLINK-34989 > URL: https://issues.apache.org/jira/browse/FLINK-34989 > Project: Flink > Issue Type: Sub-task > Components: Build System / CI >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Major > Labels: pull-request-available > > The GitHub Actions CI utilizes runners that are hosted by Apache Infra right > now. These runners are limited. The runner usage can be monitored via the > following links: > * [Flink-specific > report|https://infra-reports.apache.org/#ghactions=flink=168] > (needs ASF committer rights) This project-specific report can only be > modified through the HTTP GET parameters of the URL. > * [Global report|https://infra-reports.apache.org/#ghactions] (needs ASF > membership) > There was a policy change announced recently: > {quote} > Policy change on use of GitHub Actions > Due to misconfigurations in their builds, some projects have been using > unsupportable numbers of GitHub Actions. As part of fixing this situation, > Infra has added a 'resource use' section to the policy on GitHub Actions. > This section of the policy will come into effect on April 20, 2024: > All workflows MUST have a job concurrency level less than or equal to 20. > This means a workflow cannot have more than 20 jobs running at the same time > across all matrices. > All workflows SHOULD have a job concurrency level less than or equal to 15. > Just because 20 is the max, doesn't mean you should strive for 20. > The average number of minutes a project uses per calendar week MUST NOT > exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 > hours). > The average number of minutes a project uses in any consecutive five-day > period MUST NOT exceed the equivalent of 30 full-time runners (216,000 > minutes, or 3,600 hours). > Projects whose builds consistently cross the maximum use limits will lose > their access to GitHub Actions until they fix their build configurations. > The full policy is at > https://infra.apache.org/github-actions-policy.html. > {quote} > Currently (last week of March 2024) Flink was ranked at #19 of projects that > used the Apache Infra runner resources the most which doesn't seem too bad. > This contained not only Apache Flink but also the Kubernetes operator, > connectors and other resources. According to [this > source|https://infra.apache.org/github-actions-secrets.html] Apache Infra > manages 180 runners right now. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34989) Apache Infra requests to reduce the runner usage for a project
[ https://issues.apache.org/jira/browse/FLINK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34989: -- Description: The GitHub Actions CI utilizes runners that are hosted by Apache Infra right now. These runners are limited. The runner usage can be monitored via the following links: * [Flink-specific report|https://infra-reports.apache.org/#ghactions=flink=168] (needs ASF committer rights) This project-specific report can only be modified through the HTTP GET parameters of the URL. * [Global report|https://infra-reports.apache.org/#ghactions] (needs ASF membership) There was a policy change announced recently: {quote} Policy change on use of GitHub Actions Due to misconfigurations in their builds, some projects have been using unsupportable numbers of GitHub Actions. As part of fixing this situation, Infra has added a 'resource use' section to the policy on GitHub Actions. This section of the policy will come into effect on April 20, 2024: All workflows MUST have a job concurrency level less than or equal to 20. This means a workflow cannot have more than 20 jobs running at the same time across all matrices. All workflows SHOULD have a job concurrency level less than or equal to 15. Just because 20 is the max, doesn't mean you should strive for 20. The average number of minutes a project uses per calendar week MUST NOT exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 hours). The average number of minutes a project uses in any consecutive five-day period MUST NOT exceed the equivalent of 30 full-time runners (216,000 minutes, or 3,600 hours). Projects whose builds consistently cross the maximum use limits will lose their access to GitHub Actions until they fix their build configurations. The full policy is at https://infra.apache.org/github-actions-policy.html. {quote} Currently (last week of March 2024) Flink was ranked at #19 of projects that used the Apache Infra runner resources the most which doesn't seem too bad. This contained not only Apache Flink but also the Kubernetes operator, connectors and other resources. According to [this source|https://infra.apache.org/github-actions-secrets.html] Apache Infra manages 180 runners right now. was: The GitHub Actions CI utilizes runners that are hosted by Apache Infra right now. These runners are limited. The runner usage can be monitored via the following links: * [Flink-specific report|https://infra-reports.apache.org/#ghactions=flink=168] (needs ASF committer rights) This project-specific report can only be modified through the HTTP GET parameters of the URL. * [Global report|https://infra-reports.apache.org/#ghactions] (needs ASF membership) There was a policy change announced recently: {quote} Policy change on use of GitHub Actions Due to misconfigurations in their builds, some projects have been using unsupportable numbers of GitHub Actions. As part of fixing this situation, Infra has added a 'resource use' section to the policy on GitHub Actions. This section of the policy will come into effect on April 20, 2024: All workflows MUST have a job concurrency level less than or equal to 20. This means a workflow cannot have more than 20 jobs running at the same time across all matrices. All workflows SHOULD have a job concurrency level less than or equal to 15. Just because 20 is the max, doesn't mean you should strive for 20. The average number of minutes a project uses per calendar week MUST NOT exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 hours). The average number of minutes a project uses in any consecutive five-day period MUST NOT exceed the equivalent of 30 full-time runners (216,000 minutes, or 3,600 hours). Projects whose builds consistently cross the maximum use limits will lose their access to GitHub Actions until they fix their build configurations. The full policy is at https://infra.apache.org/github-actions-policy.html. {quote} Currently (last week of March 2024) Flink was ranked at #19 of projects that used the Apache Infra runner resources the most which doesn't seem too bad. This contained not only Apache Flink but also the Kubernetes operator, connectors and other resources. > Apache Infra requests to reduce the runner usage for a project > -- > > Key: FLINK-34989 > URL: https://issues.apache.org/jira/browse/FLINK-34989 > Project: Flink > Issue Type: Sub-task > Components: Build System / CI >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Major > Labels: pull-request-available > > The GitHub Actions CI utilizes runners that are hosted by Apache Infra right > now. These runners are limited. The runner usage can be monitored via the >
[jira] [Commented] (FLINK-34989) Apache Infra requests to reduce the runner usage for a project
[ https://issues.apache.org/jira/browse/FLINK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833155#comment-17833155 ] Matthias Pohl commented on FLINK-34989: --- For this issue, we should keep in mind that it is only affecting the non-ephemeral runners. FLINK-34331 works on enabling ephemeral runners for Apache Flink. Ephemeral runners would allow us to donate project specific runners, i.e. someone could donate hardware to allow Flink to have its own runners and not to worry to much about blocking other projects with CI. > Apache Infra requests to reduce the runner usage for a project > -- > > Key: FLINK-34989 > URL: https://issues.apache.org/jira/browse/FLINK-34989 > Project: Flink > Issue Type: Sub-task > Components: Build System / CI >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Major > > The GitHub Actions CI utilizes runners that are hosted by Apache Infra right > now. These runners are limited. The runner usage can be monitored via the > following links: > * [Flink-specific > report|https://infra-reports.apache.org/#ghactions=flink=168] > (needs ASF committer rights) This project-specific report can only be > modified through the HTTP GET parameters of the URL. > * [Global report|https://infra-reports.apache.org/#ghactions] (needs ASF > membership) > There was a policy change announced recently: > {quote} > Policy change on use of GitHub Actions > Due to misconfigurations in their builds, some projects have been using > unsupportable numbers of GitHub Actions. As part of fixing this situation, > Infra has added a 'resource use' section to the policy on GitHub Actions. > This section of the policy will come into effect on April 20, 2024: > All workflows MUST have a job concurrency level less than or equal to 20. > This means a workflow cannot have more than 20 jobs running at the same time > across all matrices. > All workflows SHOULD have a job concurrency level less than or equal to 15. > Just because 20 is the max, doesn't mean you should strive for 20. > The average number of minutes a project uses per calendar week MUST NOT > exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 > hours). > The average number of minutes a project uses in any consecutive five-day > period MUST NOT exceed the equivalent of 30 full-time runners (216,000 > minutes, or 3,600 hours). > Projects whose builds consistently cross the maximum use limits will lose > their access to GitHub Actions until they fix their build configurations. > The full policy is at > https://infra.apache.org/github-actions-policy.html. > {quote} > Currently (last week of March 2024) Flink was ranked at #19 of projects that > used the Apache Infra runner resources the most which doesn't seem too bad. > This contained not only Apache Flink but also the Kubernetes operator, > connectors and other resources. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34331) Enable Apache INFRA ephemeral runners for nightly builds
[ https://issues.apache.org/jira/browse/FLINK-34331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34331: -- Summary: Enable Apache INFRA ephemeral runners for nightly builds (was: Enable Apache INFRA runners for nightly builds) > Enable Apache INFRA ephemeral runners for nightly builds > > > Key: FLINK-34331 > URL: https://issues.apache.org/jira/browse/FLINK-34331 > Project: Flink > Issue Type: Sub-task > Components: Build System / CI >Affects Versions: 1.19.0, 1.18.1 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Major > Labels: pull-request-available > > The nightly CI is currently still utilizing the GitHub runners. We want to > switch to Apache INFRA's ephemeral runners (see > [docs|https://cwiki.apache.org/confluence/display/INFRA/ASF+Infra+provided+self-hosted+runners]). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34989) Apache Infra requests to reduce the runner usage for a project
[ https://issues.apache.org/jira/browse/FLINK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833154#comment-17833154 ] Matthias Pohl commented on FLINK-34989: --- This Jira issue is about adding job concurrency support. Ideally, we should make it configurable in an easy way and set it to a concurrency level >20 as requested by Apache Infra. This affects the nightly builds which run per branch with 5 different test profiles and each test profile having 11 runners (10 stages + a short-running license check) being occupied in parallel. Generally, we should make CI be more selective anyway. Apache Infra constantly criticizes projects to run heavy-load CI for things like simple doc changes. > Apache Infra requests to reduce the runner usage for a project > -- > > Key: FLINK-34989 > URL: https://issues.apache.org/jira/browse/FLINK-34989 > Project: Flink > Issue Type: Sub-task > Components: Build System / CI >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Major > > The GitHub Actions CI utilizes runners that are hosted by Apache Infra right > now. These runners are limited. The runner usage can be monitored via the > following links: > * [Flink-specific > report|https://infra-reports.apache.org/#ghactions=flink=168] > (needs ASF committer rights) This project-specific report can only be > modified through the HTTP GET parameters of the URL. > * [Global report|https://infra-reports.apache.org/#ghactions] (needs ASF > membership) > There was a policy change announced recently: > {quote} > Policy change on use of GitHub Actions > Due to misconfigurations in their builds, some projects have been using > unsupportable numbers of GitHub Actions. As part of fixing this situation, > Infra has added a 'resource use' section to the policy on GitHub Actions. > This section of the policy will come into effect on April 20, 2024: > All workflows MUST have a job concurrency level less than or equal to 20. > This means a workflow cannot have more than 20 jobs running at the same time > across all matrices. > All workflows SHOULD have a job concurrency level less than or equal to 15. > Just because 20 is the max, doesn't mean you should strive for 20. > The average number of minutes a project uses per calendar week MUST NOT > exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 > hours). > The average number of minutes a project uses in any consecutive five-day > period MUST NOT exceed the equivalent of 30 full-time runners (216,000 > minutes, or 3,600 hours). > Projects whose builds consistently cross the maximum use limits will lose > their access to GitHub Actions until they fix their build configurations. > The full policy is at > https://infra.apache.org/github-actions-policy.html. > {quote} > Currently (last week of March 2024) Flink was ranked at #19 of projects that > used the Apache Infra runner resources the most which doesn't seem too bad. > This contained not only Apache Flink but also the Kubernetes operator, > connectors and other resources. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34989) Apache Infra requests to reduce the runner usage for a project
[ https://issues.apache.org/jira/browse/FLINK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833153#comment-17833153 ] Matthias Pohl commented on FLINK-34989: --- Here's a summary of the requirements and whether we meet them based on the most-recent report: || Requirement || Flink CI || | Job concurrency level 20 (or better 15) or below | (n) | | Do not exceed 25 full-time runners (FT runner), i.e. 4200hours per 7 days | (y) | | Avg number of minutes should not exceed 3600 hours per 5 days | (y) | > Apache Infra requests to reduce the runner usage for a project > -- > > Key: FLINK-34989 > URL: https://issues.apache.org/jira/browse/FLINK-34989 > Project: Flink > Issue Type: Sub-task > Components: Build System / CI >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Major > > The GitHub Actions CI utilizes runners that are hosted by Apache Infra right > now. These runners are limited. The runner usage can be monitored via the > following links: > * [Flink-specific > report|https://infra-reports.apache.org/#ghactions=flink=168] > (needs ASF committer rights) This project-specific report can only be > modified through the HTTP GET parameters of the URL. > * [Global report|https://infra-reports.apache.org/#ghactions] (needs ASF > membership) > There was a policy change announced recently: > {quote} > Policy change on use of GitHub Actions > Due to misconfigurations in their builds, some projects have been using > unsupportable numbers of GitHub Actions. As part of fixing this situation, > Infra has added a 'resource use' section to the policy on GitHub Actions. > This section of the policy will come into effect on April 20, 2024: > All workflows MUST have a job concurrency level less than or equal to 20. > This means a workflow cannot have more than 20 jobs running at the same time > across all matrices. > All workflows SHOULD have a job concurrency level less than or equal to 15. > Just because 20 is the max, doesn't mean you should strive for 20. > The average number of minutes a project uses per calendar week MUST NOT > exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 > hours). > The average number of minutes a project uses in any consecutive five-day > period MUST NOT exceed the equivalent of 30 full-time runners (216,000 > minutes, or 3,600 hours). > Projects whose builds consistently cross the maximum use limits will lose > their access to GitHub Actions until they fix their build configurations. > The full policy is at > https://infra.apache.org/github-actions-policy.html. > {quote} > Currently (last week of March 2024) Flink was ranked at #19 of projects that > used the Apache Infra runner resources the most which doesn't seem too bad. > This contained not only Apache Flink but also the Kubernetes operator, > connectors and other resources. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34989) Apache Infra requests to reduce the runner usage for a project
[ https://issues.apache.org/jira/browse/FLINK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34989: -- Description: The GitHub Actions CI utilizes runners that are hosted by Apache Infra right now. These runners are limited. The runner usage can be monitored via the following links: * [Flink-specific report|https://infra-reports.apache.org/#ghactions=flink=168] (needs ASF committer rights) This project-specific report can only be modified through the HTTP GET parameters of the URL. * [Global report|https://infra-reports.apache.org/#ghactions] (needs ASF membership) There was a policy change announced recently: {quote} Policy change on use of GitHub Actions Due to misconfigurations in their builds, some projects have been using unsupportable numbers of GitHub Actions. As part of fixing this situation, Infra has added a 'resource use' section to the policy on GitHub Actions. This section of the policy will come into effect on April 20, 2024: All workflows MUST have a job concurrency level less than or equal to 20. This means a workflow cannot have more than 20 jobs running at the same time across all matrices. All workflows SHOULD have a job concurrency level less than or equal to 15. Just because 20 is the max, doesn't mean you should strive for 20. The average number of minutes a project uses per calendar week MUST NOT exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 hours). The average number of minutes a project uses in any consecutive five-day period MUST NOT exceed the equivalent of 30 full-time runners (216,000 minutes, or 3,600 hours). Projects whose builds consistently cross the maximum use limits will lose their access to GitHub Actions until they fix their build configurations. The full policy is at https://infra.apache.org/github-actions-policy.html. {quote} Currently (last week of March 2024) Flink was ranked at #19 of projects that used the Apache Infra runner resources the most which doesn't seem too bad. This contained not only Apache Flink but also the Kubernetes operator, connectors and other resources. was: The GitHub Actions CI utilizes runners that are hosted by Apache Infra right now. These runners are limited. The runner usage can be monitored via the following links: * [Flink-specific report|https://infra-reports.apache.org/#ghactions=flink=168] (needs ASF committer rights) This project-specific report can only be modified through the HTTP GET parameters of the URL. * [Global report|https://infra-reports.apache.org/#ghactions] (needs ASF membership) There was a policy change announced recently: {quote} Policy change on use of GitHub Actions Due to misconfigurations in their builds, some projects have been using unsupportable numbers of GitHub Actions. As part of fixing this situation, Infra has added a 'resource use' section to the policy on GitHub Actions. This section of the policy will come into effect on April 20, 2024: All workflows MUST have a job concurrency level less than or equal to 20. This means a workflow cannot have more than 20 jobs running at the same time across all matrices. All workflows SHOULD have a job concurrency level less than or equal to 15. Just because 20 is the max, doesn't mean you should strive for 20. The average number of minutes a project uses per calendar week MUST NOT exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 hours). The average number of minutes a project uses in any consecutive five-day period MUST NOT exceed the equivalent of 30 full-time runners (216,000 minutes, or 3,600 hours). Projects whose builds consistently cross the maximum use limits will lose their access to GitHub Actions until they fix their build configurations. The full policy is at https://infra.apache.org/github-actions-policy.html. {quote} > Apache Infra requests to reduce the runner usage for a project > -- > > Key: FLINK-34989 > URL: https://issues.apache.org/jira/browse/FLINK-34989 > Project: Flink > Issue Type: Sub-task > Components: Build System / CI >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Major > > The GitHub Actions CI utilizes runners that are hosted by Apache Infra right > now. These runners are limited. The runner usage can be monitored via the > following links: > * [Flink-specific > report|https://infra-reports.apache.org/#ghactions=flink=168] > (needs ASF committer rights) This project-specific report can only be > modified through the HTTP GET parameters of the URL. > * [Global report|https://infra-reports.apache.org/#ghactions] (needs ASF > membership) > There was a policy change announced recently: > {quote} > Policy change on use of GitHub Actions > Due to
[jira] [Updated] (FLINK-34989) Apache Infra requests to reduce the runner usage for a project
[ https://issues.apache.org/jira/browse/FLINK-34989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34989: -- Description: The GitHub Actions CI utilizes runners that are hosted by Apache Infra right now. These runners are limited. The runner usage can be monitored via the following links: * [Flink-specific report|https://infra-reports.apache.org/#ghactions=flink=168] (needs ASF committer rights) This project-specific report can only be modified through the HTTP GET parameters of the URL. * [Global report|https://infra-reports.apache.org/#ghactions] (needs ASF membership) There was a policy change announced recently: {quote} Policy change on use of GitHub Actions Due to misconfigurations in their builds, some projects have been using unsupportable numbers of GitHub Actions. As part of fixing this situation, Infra has added a 'resource use' section to the policy on GitHub Actions. This section of the policy will come into effect on April 20, 2024: All workflows MUST have a job concurrency level less than or equal to 20. This means a workflow cannot have more than 20 jobs running at the same time across all matrices. All workflows SHOULD have a job concurrency level less than or equal to 15. Just because 20 is the max, doesn't mean you should strive for 20. The average number of minutes a project uses per calendar week MUST NOT exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 hours). The average number of minutes a project uses in any consecutive five-day period MUST NOT exceed the equivalent of 30 full-time runners (216,000 minutes, or 3,600 hours). Projects whose builds consistently cross the maximum use limits will lose their access to GitHub Actions until they fix their build configurations. The full policy is at https://infra.apache.org/github-actions-policy.html. {quote} was: The GitHub Actions CI utilizes runners that are hosted by Apache Infra right now. These runners are limited. The runner usage can be monitored via the following links: * [Flink-specific report|https://infra-reports.apache.org/#ghactions=flink=168] (needs ASF committer rights) This project-specific report can only be modified through the HTTP GET parameters of the URL. * [Global report|https://infra-reports.apache.org/#ghactions] (needs ASF membership) > Apache Infra requests to reduce the runner usage for a project > -- > > Key: FLINK-34989 > URL: https://issues.apache.org/jira/browse/FLINK-34989 > Project: Flink > Issue Type: Sub-task > Components: Build System / CI >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Major > > The GitHub Actions CI utilizes runners that are hosted by Apache Infra right > now. These runners are limited. The runner usage can be monitored via the > following links: > * [Flink-specific > report|https://infra-reports.apache.org/#ghactions=flink=168] > (needs ASF committer rights) This project-specific report can only be > modified through the HTTP GET parameters of the URL. > * [Global report|https://infra-reports.apache.org/#ghactions] (needs ASF > membership) > There was a policy change announced recently: > {quote} > Policy change on use of GitHub Actions > Due to misconfigurations in their builds, some projects have been using > unsupportable numbers of GitHub Actions. As part of fixing this situation, > Infra has added a 'resource use' section to the policy on GitHub Actions. > This section of the policy will come into effect on April 20, 2024: > All workflows MUST have a job concurrency level less than or equal to 20. > This means a workflow cannot have more than 20 jobs running at the same time > across all matrices. > All workflows SHOULD have a job concurrency level less than or equal to 15. > Just because 20 is the max, doesn't mean you should strive for 20. > The average number of minutes a project uses per calendar week MUST NOT > exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 > hours). > The average number of minutes a project uses in any consecutive five-day > period MUST NOT exceed the equivalent of 30 full-time runners (216,000 > minutes, or 3,600 hours). > Projects whose builds consistently cross the maximum use limits will lose > their access to GitHub Actions until they fix their build configurations. > The full policy is at > https://infra.apache.org/github-actions-policy.html. > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34937) Apache Infra GHA policy update
[ https://issues.apache.org/jira/browse/FLINK-34937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833149#comment-17833149 ] Matthias Pohl commented on FLINK-34937: --- I moved the runner usage discussion into FLINK-34989 > Apache Infra GHA policy update > -- > > Key: FLINK-34937 > URL: https://issues.apache.org/jira/browse/FLINK-34937 > Project: Flink > Issue Type: Sub-task > Components: Build System / CI >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Major > > There is a policy update [announced in the infra > ML|https://www.mail-archive.com/jdo-dev@db.apache.org/msg13638.html] which > asked Apache projects to limit the number of runners per job. Additionally, > the [GHA policy|https://infra.apache.org/github-actions-policy.html] is > referenced which I wasn't aware of when working on the action workflow. > This issue is about applying the policy to the Flink GHA workflows. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34989) Apache Infra requests to reduce the runner usage for a project
Matthias Pohl created FLINK-34989: - Summary: Apache Infra requests to reduce the runner usage for a project Key: FLINK-34989 URL: https://issues.apache.org/jira/browse/FLINK-34989 Project: Flink Issue Type: Sub-task Components: Build System / CI Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl The GitHub Actions CI utilizes runners that are hosted by Apache Infra right now. These runners are limited. The runner usage can be monitored via the following links: * [Flink-specific report|https://infra-reports.apache.org/#ghactions=flink=168] (needs ASF committer rights) This project-specific report can only be modified through the HTTP GET parameters of the URL. * [Global report|https://infra-reports.apache.org/#ghactions] (needs ASF membership) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34427) FineGrainedSlotManagerTest fails fatally (exit code 239)
[ https://issues.apache.org/jira/browse/FLINK-34427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833098#comment-17833098 ] Matthias Pohl commented on FLINK-34427: --- Copied over from FLINK-33416: * https://github.com/XComp/flink/actions/runs/6472726326/job/17575765131 * 1.19: https://github.com/apache/flink/actions/runs/8467681781/job/23199435037#step:10:8909 > FineGrainedSlotManagerTest fails fatally (exit code 239) > > > Key: FLINK-34427 > URL: https://issues.apache.org/jira/browse/FLINK-34427 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Critical > Labels: pull-request-available, test-stability > > https://github.com/apache/flink/actions/runs/7866453350/job/21460921911#step:10:8959 > {code} > Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239 > Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests: > Error: 02:28:53 02:28:53.220 [ERROR] > org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest > Error: 02:28:53 02:28:53.220 [ERROR] > org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying goodbye. > VM crash or System.exit called? > Error: 02:28:53 02:28:53.220 [ERROR] Command was /bin/sh -c cd > '/root/flink/flink-runtime' && > '/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java' '-XX:+UseG1GC' '-Xms256m' > '-XX:+IgnoreUnrecognizedVMOptions' > '--add-opens=java.base/java.util=ALL-UNNAMED' > '--add-opens=java.base/java.lang=ALL-UNNAMED' > '--add-opens=java.base/java.net=ALL-UNNAMED' > '--add-opens=java.base/java.io=ALL-UNNAMED' > '--add-opens=java.base/java.util.concurrent=ALL-UNNAMED' '-Xmx768m' '-jar' > '/root/flink/flink-runtime/target/surefire/surefirebooter-20240212022332296_94.jar' > '/root/flink/flink-runtime/target/surefire' > '2024-02-12T02-21-39_495-jvmRun3' 'surefire-20240212022332296_88tmp' > 'surefire_26-20240212022332296_91tmp' > Error: 02:28:53 02:28:53.220 [ERROR] Error occurred in starting fork, check > output in log > Error: 02:28:53 02:28:53.220 [ERROR] Process Exit Code: 239 > Error: 02:28:53 02:28:53.220 [ERROR] Crashed tests: > Error: 02:28:53 02:28:53.221 [ERROR] > org.apache.flink.runtime.resourcemanager.ResourceManagerTaskExecutorTest > Error: 02:28:53 02:28:53.221 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456) > [...] > {code} > The fatal error is triggered most likely within the > {{FineGrainedSlotManagerTest}}: > {code} > 02:26:39,362 [ pool-643-thread-1] ERROR > org.apache.flink.util.FatalExitExceptionHandler [] - FATAL: > Thread 'pool-643-thread-1' produced an uncaught exception. Stopping the > process... > java.util.concurrent.CompletionException: > java.util.concurrent.RejectedExecutionException: Task > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@4bbc0b10 > rejected from > java.util.concurrent.ScheduledThreadPoolExecutor@7a45cd9a[Shutting down, pool > size = 1, active threads = 1, queued tasks = 1, completed tasks = 194] > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:838) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:851) > ~[?:1.8.0_392] > at > java.util.concurrent.CompletableFuture.handleAsync(CompletableFuture.java:2178) > ~[?:1.8.0_392] > at > org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer.allocateSlot(DefaultSlotStatusSyncer.java:138) > ~[classes/:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.allocateSlotsAccordingTo(FineGrainedSlotManager.java:722) > ~[classes/:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.checkResourceRequirements(FineGrainedSlotManager.java:645) > ~[classes/:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.lambda$null$12(FineGrainedSlotManager.java:603) > ~[classes/:?] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [?:1.8.0_392] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [?:1.8.0_392] > at >
[jira] [Closed] (FLINK-33416) FineGrainedSlotManagerTest failed with fatal error
[ https://issues.apache.org/jira/browse/FLINK-33416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl closed FLINK-33416. - Resolution: Duplicate This issue is addressed in FLINK-34427. I'm closing FLINK-33416 in favor of FLINK-34427 because the investigation happened there. > FineGrainedSlotManagerTest failed with fatal error > -- > > Key: FLINK-33416 > URL: https://issues.apache.org/jira/browse/FLINK-33416 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Matthias Pohl >Priority: Major > Labels: github-actions, test-stability > > In FLINK-33245, we reported an error of the > {{ZooKeeperLeaderElectionConnectionHandlingTest}} failure due to a fatal > error. The corresponding build is [this > one|https://github.com/XComp/flink/actions/runs/6472726326/job/17575765131]. > But the stacktrace indicates that it's actually > {{FineGrainedSlotManagerTest}} which ran before the ZK-related test: > {code} > Test > org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManagerTest.testSlotAllocationAccordingToStrategyResult[testSlotAllocationAccordingToStrategyResult()] > successfully run. > > 19:30:11,463 [ pool-752-thread-1] ERROR > org.apache.flink.util.FatalExitExceptionHandler [] - FATAL: > Thread 'pool-752-thread-1' produced an uncaught exception. Stopping the > process... > java.util.concurrent.CompletionException: > java.util.concurrent.RejectedExecutionException: Task > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@1201ef67[Not > completed, task = > java.util.concurrent.Executors$RunnableAdapter@1ea6ccfa[Wrapped task = > java.util.concurrent.CompletableFuture$UniHandle@36f84d94]] rejected from > java.util.concurrent.ScheduledThreadPoolExecutor@4642c78d[Shutting down, pool > size = 1, active threads = 1, queued tasks = 1, completed tasks = 194] > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) > ~[?:?] > at > java.util.concurrent.CompletableFuture.uniHandleStage(CompletableFuture.java:951) > ~[?:?] > at > java.util.concurrent.CompletableFuture.handleAsync(CompletableFuture.java:2276) > ~[?:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer.allocateSlot(DefaultSlotStatusSyncer.java:138) > ~[classes/:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.allocateSlotsAccordingTo(FineGrainedSlotManager.java:722) > ~[classes/:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.checkResourceRequirements(FineGrainedSlotManager.java:645) > ~[classes/:?] > at > org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager.lambda$checkResourceRequirementsWithDelay$12(FineGrainedSlotManager.java:603) > ~[classes/:?] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?] > at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > [?:?] > at java.lang.Thread.run(Thread.java:829) [?:?] > Caused by: java.util.concurrent.RejectedExecutionException: Task > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@1201ef67[Not > completed, task = > java.util.concurrent.Executors$RunnableAdapter@1ea6ccfa[Wrapped task = > java.util.concurrent.CompletableFuture$UniHandle@36f84d94]] rejected from > java.util.concurrent.ScheduledThreadPoolExecutor@4642c78d[Shutting down, pool > size = 1, active threads = 1, queued tasks = 1, completed tasks = 194] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825) > ~[?:?] > at > java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:340) > ~[?:?] > at > java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:562) > ~[?:?] > at > java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:705) > ~[?:?] > at > java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:687) > ~[?:?] > at >
[jira] [Comment Edited] (FLINK-34988) Class loading issues in JDK17 and JDK21
[ https://issues.apache.org/jira/browse/FLINK-34988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833095#comment-17833095 ] Matthias Pohl edited comment on FLINK-34988 at 4/2/24 10:07 AM: It's most likely caused by FLINK-34548 based on the git history between the most recent successful nightly run on master [20240331.1|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58645=results] (based on {{3841f062}}) and [20240402.1|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676=results] (based on {{d271495c}}): {code} $ git log 3841f062..d271495c --oneline d271495c5be [hotfix] Fix compile error in DataStreamV2SinkTransformation 28762497bdf [FLINK-34548][API] Supports sink-v2 Sink 056660e0b69 [FLINK-34548][API] Supports FLIP-27 Source ceafa5a5705 [FLINK-34548][API] Implement datastream 4f71c5b4660 [FLINK-34548][API] Implement process function's underlying operators e1147ca7e39 [FLINK-34548][API] Introduce ExecutionEnvironment 9fa74a8a706 [FLINK-34548][API] Introduce stream interface and move KeySelector to flink-core-api cedbcce6eff [FLINK-34548][API] Introduce variants of ProcessFunction 13cfaa76b5e [FLINK-34548][API] Introduce ProcessFunction and RuntimeContext related interfaces 13790e03207 [FLINK-34548][API] Move Function interface to flink-core-api 59525e460af [FLINK-34548][API] Create flink-core-api module and let flink-core depend on it 5b2e923be0a [FLINK-34548][API] Initialize the datastream v2 related modules {code} was (Author: mapohl): It's most likely caused by FLINK-34548 based on the git history between the most recent successful nightly run on master [20240331.1|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58645=results] (based on {{3841f062}}) and [20240402.1|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676=results] (based on {{d271495c}}): {code} $ git log 3841f062..d271495c5be34f4e4a518207ca7716f4e8907e5f --oneline d271495c5be [hotfix] Fix compile error in DataStreamV2SinkTransformation 28762497bdf [FLINK-34548][API] Supports sink-v2 Sink 056660e0b69 [FLINK-34548][API] Supports FLIP-27 Source ceafa5a5705 [FLINK-34548][API] Implement datastream 4f71c5b4660 [FLINK-34548][API] Implement process function's underlying operators e1147ca7e39 [FLINK-34548][API] Introduce ExecutionEnvironment 9fa74a8a706 [FLINK-34548][API] Introduce stream interface and move KeySelector to flink-core-api cedbcce6eff [FLINK-34548][API] Introduce variants of ProcessFunction 13cfaa76b5e [FLINK-34548][API] Introduce ProcessFunction and RuntimeContext related interfaces 13790e03207 [FLINK-34548][API] Move Function interface to flink-core-api 59525e460af [FLINK-34548][API] Create flink-core-api module and let flink-core depend on it 5b2e923be0a [FLINK-34548][API] Initialize the datastream v2 related modules {code} > Class loading issues in JDK17 and JDK21 > --- > > Key: FLINK-34988 > URL: https://issues.apache.org/jira/browse/FLINK-34988 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 1.20.0 >Reporter: Matthias Pohl >Priority: Major > Labels: test-stability > > * JDK 17 (core; NoClassDefFoundError caused by ExceptionInInitializeError): > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676=logs=675bf62c-8558-587e-2555-dcad13acefb5=5878eed3-cc1e-5b12-1ed0-9e7139ce0992=12942 > * JDK 17 (misc; ExceptionInInitializeError): > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=22548 > * JDK 21 (core; same as above): > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676=logs=d06b80b4-9e88-5d40-12a2-18072cf60528=609ecd5a-3f6e-5d0c-2239-2096b155a4d0=12963 > * JDK 21 (misc; same as above): > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676=logs=59a2b95a-736b-5c46-b3e0-cee6e587fd86=c301da75-e699-5c06-735f-778207c16f50=22506 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34988) Class loading issues in JDK17 and JDK21
[ https://issues.apache.org/jira/browse/FLINK-34988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833095#comment-17833095 ] Matthias Pohl commented on FLINK-34988: --- It's most likely caused by FLINK-34548 based on the git history between the most recent successful nightly run on master [20240331.1|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58645=results] (based on {{3841f062}}) and [20240402.1|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676=results] (based on {{d271495c}}): {code} $ git log 3841f062..d271495c5be34f4e4a518207ca7716f4e8907e5f --oneline d271495c5be [hotfix] Fix compile error in DataStreamV2SinkTransformation 28762497bdf [FLINK-34548][API] Supports sink-v2 Sink 056660e0b69 [FLINK-34548][API] Supports FLIP-27 Source ceafa5a5705 [FLINK-34548][API] Implement datastream 4f71c5b4660 [FLINK-34548][API] Implement process function's underlying operators e1147ca7e39 [FLINK-34548][API] Introduce ExecutionEnvironment 9fa74a8a706 [FLINK-34548][API] Introduce stream interface and move KeySelector to flink-core-api cedbcce6eff [FLINK-34548][API] Introduce variants of ProcessFunction 13cfaa76b5e [FLINK-34548][API] Introduce ProcessFunction and RuntimeContext related interfaces 13790e03207 [FLINK-34548][API] Move Function interface to flink-core-api 59525e460af [FLINK-34548][API] Create flink-core-api module and let flink-core depend on it 5b2e923be0a [FLINK-34548][API] Initialize the datastream v2 related modules {code} > Class loading issues in JDK17 and JDK21 > --- > > Key: FLINK-34988 > URL: https://issues.apache.org/jira/browse/FLINK-34988 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 1.20.0 >Reporter: Matthias Pohl >Priority: Major > Labels: test-stability > > * JDK 17 (core; NoClassDefFoundError caused by ExceptionInInitializeError): > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676=logs=675bf62c-8558-587e-2555-dcad13acefb5=5878eed3-cc1e-5b12-1ed0-9e7139ce0992=12942 > * JDK 17 (misc; ExceptionInInitializeError): > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=22548 > * JDK 21 (core; same as above): > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676=logs=d06b80b4-9e88-5d40-12a2-18072cf60528=609ecd5a-3f6e-5d0c-2239-2096b155a4d0=12963 > * JDK 21 (misc; same as above): > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676=logs=59a2b95a-736b-5c46-b3e0-cee6e587fd86=c301da75-e699-5c06-735f-778207c16f50=22506 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34988) Class loading issues in JDK17 and JDK21
Matthias Pohl created FLINK-34988: - Summary: Class loading issues in JDK17 and JDK21 Key: FLINK-34988 URL: https://issues.apache.org/jira/browse/FLINK-34988 Project: Flink Issue Type: Bug Components: API / DataStream Affects Versions: 1.20.0 Reporter: Matthias Pohl * JDK 17 (core; NoClassDefFoundError caused by ExceptionInInitializeError): https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676=logs=675bf62c-8558-587e-2555-dcad13acefb5=5878eed3-cc1e-5b12-1ed0-9e7139ce0992=12942 * JDK 17 (misc; ExceptionInInitializeError): https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676=logs=d871f0ce-7328-5d00-023b-e7391f5801c8=77cbea27-feb9-5cf5-53f7-3267f9f9c6b6=22548 * JDK 21 (core; same as above): https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676=logs=d06b80b4-9e88-5d40-12a2-18072cf60528=609ecd5a-3f6e-5d0c-2239-2096b155a4d0=12963 * JDK 21 (misc; same as above): https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58676=logs=59a2b95a-736b-5c46-b3e0-cee6e587fd86=c301da75-e699-5c06-735f-778207c16f50=22506 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33816) SourceStreamTaskTest.testTriggeringStopWithSavepointWithDrain failed due async checkpoint triggering not being completed
[ https://issues.apache.org/jira/browse/FLINK-33816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-33816: -- Fix Version/s: 1.19.1 > SourceStreamTaskTest.testTriggeringStopWithSavepointWithDrain failed due > async checkpoint triggering not being completed > - > > Key: FLINK-33816 > URL: https://issues.apache.org/jira/browse/FLINK-33816 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Checkpointing, Runtime / Coordination >Affects Versions: 1.19.0 >Reporter: Matthias Pohl >Assignee: jiabao.sun >Priority: Major > Labels: github-actions, pull-request-available, test-stability > Fix For: 1.20.0, 1.19.1 > > Attachments: screenshot-1.png > > > [https://github.com/XComp/flink/actions/runs/7182604625/job/19559947894#step:12:9430] > {code:java} > rror: 14:39:01 14:39:01.930 [ERROR] Tests run: 16, Failures: 1, Errors: 0, > Skipped: 0, Time elapsed: 1.878 s <<< FAILURE! - in > org.apache.flink.streaming.runtime.tasks.SourceStreamTaskTest > 9426Error: 14:39:01 14:39:01.930 [ERROR] > org.apache.flink.streaming.runtime.tasks.SourceStreamTaskTest.testTriggeringStopWithSavepointWithDrain > Time elapsed: 0.034 s <<< FAILURE! > 9427Dec 12 14:39:01 org.opentest4j.AssertionFailedError: > 9428Dec 12 14:39:01 > 9429Dec 12 14:39:01 Expecting value to be true but was false > 9430Dec 12 14:39:01 at > java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62) > 9431Dec 12 14:39:01 at > java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502) > 9432Dec 12 14:39:01 at > org.apache.flink.streaming.runtime.tasks.SourceStreamTaskTest.testTriggeringStopWithSavepointWithDrain(SourceStreamTaskTest.java:710) > 9433Dec 12 14:39:01 at > java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > 9434Dec 12 14:39:01 at > java.base/java.lang.reflect.Method.invoke(Method.java:580) > [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33816) SourceStreamTaskTest.testTriggeringStopWithSavepointWithDrain failed due async checkpoint triggering not being completed
[ https://issues.apache.org/jira/browse/FLINK-33816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833056#comment-17833056 ] Matthias Pohl commented on FLINK-33816: --- master: [5aebb04b3055fbec6a74eaf4226c4a88d3fd2d6e|https://github.com/apache/flink/commit/5aebb04b3055fbec6a74eaf4226c4a88d3fd2d6e] 1.19: [ece4faee055b3797b39e9c0b55f3e94a3db2f912|https://github.com/apache/flink/commit/ece4faee055b3797b39e9c0b55f3e94a3db2f912] > SourceStreamTaskTest.testTriggeringStopWithSavepointWithDrain failed due > async checkpoint triggering not being completed > - > > Key: FLINK-33816 > URL: https://issues.apache.org/jira/browse/FLINK-33816 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Checkpointing, Runtime / Coordination >Affects Versions: 1.19.0 >Reporter: Matthias Pohl >Assignee: jiabao.sun >Priority: Major > Labels: github-actions, pull-request-available, test-stability > Fix For: 1.20.0 > > Attachments: screenshot-1.png > > > [https://github.com/XComp/flink/actions/runs/7182604625/job/19559947894#step:12:9430] > {code:java} > rror: 14:39:01 14:39:01.930 [ERROR] Tests run: 16, Failures: 1, Errors: 0, > Skipped: 0, Time elapsed: 1.878 s <<< FAILURE! - in > org.apache.flink.streaming.runtime.tasks.SourceStreamTaskTest > 9426Error: 14:39:01 14:39:01.930 [ERROR] > org.apache.flink.streaming.runtime.tasks.SourceStreamTaskTest.testTriggeringStopWithSavepointWithDrain > Time elapsed: 0.034 s <<< FAILURE! > 9427Dec 12 14:39:01 org.opentest4j.AssertionFailedError: > 9428Dec 12 14:39:01 > 9429Dec 12 14:39:01 Expecting value to be true but was false > 9430Dec 12 14:39:01 at > java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62) > 9431Dec 12 14:39:01 at > java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502) > 9432Dec 12 14:39:01 at > org.apache.flink.streaming.runtime.tasks.SourceStreamTaskTest.testTriggeringStopWithSavepointWithDrain(SourceStreamTaskTest.java:710) > 9433Dec 12 14:39:01 at > java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > 9434Dec 12 14:39:01 at > java.base/java.lang.reflect.Method.invoke(Method.java:580) > [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34953) Add github ci for flink-web to auto commit build files
[ https://issues.apache.org/jira/browse/FLINK-34953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833036#comment-17833036 ] Matthias Pohl commented on FLINK-34953: --- Hi [~gongzhongqiang], it sounds like we reached consensus in this matter already. But you can bring this up in the dev ML to check whether there are some objections against this approach before going ahead with this ticket to have a proper backing from the community. > Add github ci for flink-web to auto commit build files > -- > > Key: FLINK-34953 > URL: https://issues.apache.org/jira/browse/FLINK-34953 > Project: Flink > Issue Type: Improvement > Components: Project Website >Reporter: Zhongqiang Gong >Priority: Minor > Labels: website > > Currently, https://github.com/apache/flink-web commit build files by local > build. So I want use github ci to build docs and commit. > > Changes: > * Add website build check for pr > * Auto build and commit build files after pr was merged to `asf-site` > * Optinal: this ci can triggered by manual -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34961) GitHub Actions runner statistcs can be monitored per workflow name
[ https://issues.apache.org/jira/browse/FLINK-34961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34961: -- Labels: starter (was: ) > GitHub Actions runner statistcs can be monitored per workflow name > -- > > Key: FLINK-34961 > URL: https://issues.apache.org/jira/browse/FLINK-34961 > Project: Flink > Issue Type: Improvement > Components: Build System / CI >Reporter: Matthias Pohl >Priority: Major > Labels: starter > > Apache Infra allows the monitoring of runner usage per workflow (see [report > for > Flink|https://infra-reports.apache.org/#ghactions=flink=168=10]; > only accessible with Apache committer rights). They accumulate the data by > workflow name. The Flink space has multiple repositories that use the generic > workflow name {{CI}}). That makes the differentiation in the report harder. > This Jira issue is about identifying all Flink-related projects with a CI > workflow (Kubernetes operator and the JDBC connector were identified, for > instance) and adding a more distinct name. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34961) GitHub Actions statistcs can be monitored per workflow name
Matthias Pohl created FLINK-34961: - Summary: GitHub Actions statistcs can be monitored per workflow name Key: FLINK-34961 URL: https://issues.apache.org/jira/browse/FLINK-34961 Project: Flink Issue Type: Improvement Components: Build System / CI Reporter: Matthias Pohl Apache Infra allows the monitoring of runner usage per workflow (see [report for Flink|https://infra-reports.apache.org/#ghactions=flink=168=10]; only accessible with Apache committer rights). They accumulate the data by workflow name. The Flink space has multiple repositories that use the generic workflow name {{CI}}). That makes the differentiation in the report harder. This Jira issue is about identifying all Flink-related projects with a CI workflow (Kubernetes operator and the JDBC connector were identified, for instance) and adding a more distinct name. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34961) GitHub Actions runner statistcs can be monitored per workflow name
[ https://issues.apache.org/jira/browse/FLINK-34961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34961: -- Summary: GitHub Actions runner statistcs can be monitored per workflow name (was: GitHub Actions statistcs can be monitored per workflow name) > GitHub Actions runner statistcs can be monitored per workflow name > -- > > Key: FLINK-34961 > URL: https://issues.apache.org/jira/browse/FLINK-34961 > Project: Flink > Issue Type: Improvement > Components: Build System / CI >Reporter: Matthias Pohl >Priority: Major > > Apache Infra allows the monitoring of runner usage per workflow (see [report > for > Flink|https://infra-reports.apache.org/#ghactions=flink=168=10]; > only accessible with Apache committer rights). They accumulate the data by > workflow name. The Flink space has multiple repositories that use the generic > workflow name {{CI}}). That makes the differentiation in the report harder. > This Jira issue is about identifying all Flink-related projects with a CI > workflow (Kubernetes operator and the JDBC connector were identified, for > instance) and adding a more distinct name. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34937) Apache Infra GHA policy update
[ https://issues.apache.org/jira/browse/FLINK-34937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831844#comment-17831844 ] Matthias Pohl commented on FLINK-34937: --- Looks like Flink is on rank 19 in terms of runner minutes used for the past 7 days: [Flink-specific report|https://infra-reports.apache.org/#ghactions=flink=168] (needs ASF committer rights) [Global report|https://infra-reports.apache.org/#ghactions] (needs ASF membership) > Apache Infra GHA policy update > -- > > Key: FLINK-34937 > URL: https://issues.apache.org/jira/browse/FLINK-34937 > Project: Flink > Issue Type: Sub-task > Components: Build System / CI >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Major > > There is a policy update [announced in the infra > ML|https://www.mail-archive.com/jdo-dev@db.apache.org/msg13638.html] which > asked Apache projects to limit the number of runners per job. Additionally, > the [GHA policy|https://infra.apache.org/github-actions-policy.html] is > referenced which I wasn't aware of when working on the action workflow. > This issue is about applying the policy to the Flink GHA workflows. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-34933) JobMasterServiceLeadershipRunnerTest#testResultFutureCompletionOfOutdatedLeaderIsIgnored isn't implemented properly
[ https://issues.apache.org/jira/browse/FLINK-34933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl resolved FLINK-34933. --- Fix Version/s: 1.18.2 1.20.0 1.19.1 Resolution: Fixed master: [1668a07276929416469392a35a77ba7699aac30b|https://github.com/apache/flink/commit/1668a07276929416469392a35a77ba7699aac30b] 1.19: [c11656a2406f07e2ae7cd6f80c46afb14385ee0e|https://github.com/apache/flink/commit/c11656a2406f07e2ae7cd6f80c46afb14385ee0e] 1.18: [94d1363c27e26fc8313721e138c7b4de744ca69e|https://github.com/apache/flink/commit/94d1363c27e26fc8313721e138c7b4de744ca69e] > JobMasterServiceLeadershipRunnerTest#testResultFutureCompletionOfOutdatedLeaderIsIgnored > isn't implemented properly > --- > > Key: FLINK-34933 > URL: https://issues.apache.org/jira/browse/FLINK-34933 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Major > Labels: pull-request-available > Fix For: 1.18.2, 1.20.0, 1.19.1 > > > {{testResultFutureCompletionOfOutdatedLeaderIsIgnored}} doesn't test the > desired behavior: The {{TestingJobMasterService#closeAsync()}} callback > throws an {{UnsupportedOperationException}} by default which prevents the > test from properly finalizing the leadership revocation. > The test is still passing because the test checks implicitly for this error. > Instead, we should verify that the runner's resultFuture doesn't complete > until the runner is closed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-33376) Extend Curator config option for Zookeeper configuration
[ https://issues.apache.org/jira/browse/FLINK-33376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl resolved FLINK-33376. --- Fix Version/s: 1.20.0 Release Note: Adds support for the following curator parameters: high-availability.zookeeper.client.authorization (curator parameter: authorization), high-availability.zookeeper.client.max-close-wait (curator parameter: maxCloseWaitMs), high-availability.zookeeper.client.simulated-session-expiration-percent (curator parameter: simulatedSessionExpirationPercent) Resolution: Fixed master: [83f82ab0c865a4fa9e119c96e11e0fb3df4a5ecd|https://github.com/apache/flink/commit/83f82ab0c865a4fa9e119c96e11e0fb3df4a5ecd] > Extend Curator config option for Zookeeper configuration > > > Key: FLINK-33376 > URL: https://issues.apache.org/jira/browse/FLINK-33376 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Reporter: Oleksandr Nitavskyi >Assignee: Oleksandr Nitavskyi >Priority: Major > Labels: pull-request-available > Fix For: 1.20.0 > > > In certain cases ZooKeeper requires additional Authentication information. > For example list of valid [names for > ensemble|https://zookeeper.apache.org/doc/r3.8.0/zookeeperAdmin.html#:~:text=for%20secure%20authentication.-,zookeeper.ensembleAuthName,-%3A%20(Java%20system%20property] > in order to prevent the accidental connecting to a wrong ensemble. > Curator allows to add additional AuthInfo object for such configuration. Thus > it would be useful to add one more additional Map property which would allow > to pass AuthInfo objects during Curator client creation. > *Acceptance Criteria:* For Flink users it is possible to configure auth info > list for Curator framework client. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33376) Extend Curator config option for Zookeeper configuration
[ https://issues.apache.org/jira/browse/FLINK-33376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-33376: -- Release Note: Adds support for the following curator parameters: high-availability.zookeeper.client.authorization (corresponding curator parameter: authorization), high-availability.zookeeper.client.max-close-wait (corresponding curator parameter: maxCloseWaitMs), high-availability.zookeeper.client.simulated-session-expiration-percent (corresponding curator parameter: simulatedSessionExpirationPercent). (was: Adds support for the following curator parameters: high-availability.zookeeper.client.authorization (curator parameter: authorization), high-availability.zookeeper.client.max-close-wait (curator parameter: maxCloseWaitMs), high-availability.zookeeper.client.simulated-session-expiration-percent (curator parameter: simulatedSessionExpirationPercent)) > Extend Curator config option for Zookeeper configuration > > > Key: FLINK-33376 > URL: https://issues.apache.org/jira/browse/FLINK-33376 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Reporter: Oleksandr Nitavskyi >Assignee: Oleksandr Nitavskyi >Priority: Major > Labels: pull-request-available > Fix For: 1.20.0 > > > In certain cases ZooKeeper requires additional Authentication information. > For example list of valid [names for > ensemble|https://zookeeper.apache.org/doc/r3.8.0/zookeeperAdmin.html#:~:text=for%20secure%20authentication.-,zookeeper.ensembleAuthName,-%3A%20(Java%20system%20property] > in order to prevent the accidental connecting to a wrong ensemble. > Curator allows to add additional AuthInfo object for such configuration. Thus > it would be useful to add one more additional Map property which would allow > to pass AuthInfo objects during Curator client creation. > *Acceptance Criteria:* For Flink users it is possible to configure auth info > list for Curator framework client. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (FLINK-34953) Add github ci for flink-web to auto commit build files
[ https://issues.apache.org/jira/browse/FLINK-34953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl reopened FLINK-34953: --- > Add github ci for flink-web to auto commit build files > -- > > Key: FLINK-34953 > URL: https://issues.apache.org/jira/browse/FLINK-34953 > Project: Flink > Issue Type: Improvement > Components: Project Website >Reporter: Zhongqiang Gong >Priority: Minor > Labels: website > > Currently, https://github.com/apache/flink-web commit build files by local > build. So I want use github ci to build docs and commit. > > Changes: > * Add website build check for pr > * Auto build and commit build files after pr was merged to `asf-site` > * Optinal: this ci can triggered by manual -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-34953) Add github ci for flink-web to auto commit build files
[ https://issues.apache.org/jira/browse/FLINK-34953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831665#comment-17831665 ] Matthias Pohl edited comment on FLINK-34953 at 3/28/24 9:52 AM: I guess we could do it. The [GitHub Actions Policy|https://infra.apache.org/github-actions-policy.html] excludes non-released artifacts like websites from the restriction: {quote}Automated services such as GitHub Actions (and Jenkins, BuildBot, etc.) MAY work on website content and other non-released data such as documentation and convenience binaries. Automated services MUST NOT push data to a repository or branch that is subject to official release as a software package by the project, unless the project secures specific prior authorization of the workflow from Infrastructure. {quote} Not sure whether they updated that one recently. Or do you have another source which is stricter, [~martijnvisser] ? was (Author: mapohl): I guess we could do it. The [GitHub Actions Policy|https://infra.apache.org/github-actions-policy.html] excludes non-released artifacts like website from the restriction: {quote}Automated services such as GitHub Actions (and Jenkins, BuildBot, etc.) MAY work on website content and other non-released data such as documentation and convenience binaries. Automated services MUST NOT push data to a repository or branch that is subject to official release as a software package by the project, unless the project secures specific prior authorization of the workflow from Infrastructure. {quote} Not sure whether they updated that one recently. Or do you have another source which is stricter, [~martijnvisser] ? > Add github ci for flink-web to auto commit build files > -- > > Key: FLINK-34953 > URL: https://issues.apache.org/jira/browse/FLINK-34953 > Project: Flink > Issue Type: Improvement > Components: Project Website >Reporter: Zhongqiang Gong >Priority: Minor > Labels: website > > Currently, https://github.com/apache/flink-web commit build files by local > build. So I want use github ci to build docs and commit. > > Changes: > * Add website build check for pr > * Auto build and commit build files after pr was merged to `asf-site` > * Optinal: this ci can triggered by manual -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34953) Add github ci for flink-web to auto commit build files
[ https://issues.apache.org/jira/browse/FLINK-34953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831665#comment-17831665 ] Matthias Pohl commented on FLINK-34953: --- I guess we could do it. The [GitHub Actions Policy|https://infra.apache.org/github-actions-policy.html] excludes non-released artifacts like website from the restriction: {quote}Automated services such as GitHub Actions (and Jenkins, BuildBot, etc.) MAY work on website content and other non-released data such as documentation and convenience binaries. Automated services MUST NOT push data to a repository or branch that is subject to official release as a software package by the project, unless the project secures specific prior authorization of the workflow from Infrastructure. {quote} Not sure whether they updated that one recently. Or do you have another source which is stricter, [~martijnvisser] ? > Add github ci for flink-web to auto commit build files > -- > > Key: FLINK-34953 > URL: https://issues.apache.org/jira/browse/FLINK-34953 > Project: Flink > Issue Type: Improvement > Components: Project Website >Reporter: Zhongqiang Gong >Priority: Minor > Labels: website > > Currently, https://github.com/apache/flink-web commit build files by local > build. So I want use github ci to build docs and commit. > > Changes: > * Add website build check for pr > * Auto build and commit build files after pr was merged to `asf-site` > * Optinal: this ci can triggered by manual -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34937) Apache Infra GHA policy update
[ https://issues.apache.org/jira/browse/FLINK-34937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831659#comment-17831659 ] Matthias Pohl commented on FLINK-34937: --- let's check https://github.com/assignUser/stash (which is provided by [~assignuser] from the Apache Arrow project and promoted in Apache Infra's roundtable group) whether our CI can benefit from it > Apache Infra GHA policy update > -- > > Key: FLINK-34937 > URL: https://issues.apache.org/jira/browse/FLINK-34937 > Project: Flink > Issue Type: Sub-task > Components: Build System / CI >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Major > > There is a policy update [announced in the infra > ML|https://www.mail-archive.com/jdo-dev@db.apache.org/msg13638.html] which > asked Apache projects to limit the number of runners per job. Additionally, > the [GHA policy|https://infra.apache.org/github-actions-policy.html] is > referenced which I wasn't aware of when working on the action workflow. > This issue is about applying the policy to the Flink GHA workflows. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-34551) Align retry mechanisms of FutureUtils
[ https://issues.apache.org/jira/browse/FLINK-34551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl reassigned FLINK-34551: - Assignee: Matthias Pohl (was: Kumar Mallikarjuna) > Align retry mechanisms of FutureUtils > - > > Key: FLINK-34551 > URL: https://issues.apache.org/jira/browse/FLINK-34551 > Project: Flink > Issue Type: Technical Debt > Components: API / Core >Affects Versions: 1.20.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Major > Labels: pull-request-available > > The retry mechanisms of FutureUtils include quite a bit of redundant code > which makes it hard to understand and to extend. The logic should be aligned > properly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34551) Align retry mechanisms of FutureUtils
[ https://issues.apache.org/jira/browse/FLINK-34551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831657#comment-17831657 ] Matthias Pohl commented on FLINK-34551: --- The intention of this ticket came from FLINK-34227 where I wanted to add logic for retrying forever. I managed to split the {{retrySuccessfulOperationWithDelay}} in FLINK-34227 in a way now that I didn't generate too much additional redundant code. I created FLINK-34551 as a follow-up anyway because I noticed that {{retrySuccessfulOperationWithDelay}} and {{retryOperation}} share some common logic and that we could improve the way how these methods decide on which executor to run the {{operation}} on (scheduledExecutor vs calling thread). Your current proposal has still redundant code. We would need to iterate over the change a bit more and discuss the contract of these methods in more detail. But unfortunately, I am gone for quite a bit soon. So, I would not be able to help you. Additionally, it's not a high-priority task right. I'm wondering whether we should unassign the task again. I want to avoid that you spend time on it and then get stuck because of missing feedback from my side. I should have considered it yesterday already. Sorry for that. > Align retry mechanisms of FutureUtils > - > > Key: FLINK-34551 > URL: https://issues.apache.org/jira/browse/FLINK-34551 > Project: Flink > Issue Type: Technical Debt > Components: API / Core >Affects Versions: 1.20.0 >Reporter: Matthias Pohl >Assignee: Kumar Mallikarjuna >Priority: Major > Labels: pull-request-available > > The retry mechanisms of FutureUtils include quite a bit of redundant code > which makes it hard to understand and to extend. The logic should be aligned > properly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-34937) Apache Infra GHA policy update
[ https://issues.apache.org/jira/browse/FLINK-34937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831422#comment-17831422 ] Matthias Pohl edited comment on FLINK-34937 at 3/27/24 3:45 PM: We should pin all actions (i.e. use the git SHA rather than a version tag) for external actions (anything other than {{actions/\*}}, {{github/\*}} and {{apache/\*}} prefixed actions). That's not the case right now. was (Author: mapohl): We should pin all actions (i.e. use the git SHA rather than a version tag) for external actions (anything other than {{actions/*}}, {{github/*}} and {{apache/*}} prefixed actions). That's not the case right now. > Apache Infra GHA policy update > -- > > Key: FLINK-34937 > URL: https://issues.apache.org/jira/browse/FLINK-34937 > Project: Flink > Issue Type: Sub-task > Components: Build System / CI >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Major > > There is a policy update [announced in the infra > ML|https://www.mail-archive.com/jdo-dev@db.apache.org/msg13638.html] which > asked Apache projects to limit the number of runners per job. Additionally, > the [GHA policy|https://infra.apache.org/github-actions-policy.html] is > referenced which I wasn't aware of when working on the action workflow. > This issue is about applying the policy to the Flink GHA workflows. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34937) Apache Infra GHA policy update
[ https://issues.apache.org/jira/browse/FLINK-34937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831422#comment-17831422 ] Matthias Pohl commented on FLINK-34937: --- We should pin all actions (i.e. use the git SHA rather than a version tag) for external actions (anything other than {{actions/*}}, {{github/*}} and {{apache/*}} prefixed actions). That's not the case right now. > Apache Infra GHA policy update > -- > > Key: FLINK-34937 > URL: https://issues.apache.org/jira/browse/FLINK-34937 > Project: Flink > Issue Type: Sub-task > Components: Build System / CI >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Major > > There is a policy update [announced in the infra > ML|https://www.mail-archive.com/jdo-dev@db.apache.org/msg13638.html] which > asked Apache projects to limit the number of runners per job. Additionally, > the [GHA policy|https://infra.apache.org/github-actions-policy.html] is > referenced which I wasn't aware of when working on the action workflow. > This issue is about applying the policy to the Flink GHA workflows. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-34419) flink-docker's .github/workflows/snapshot.yml doesn't support JDK 17 and 21
[ https://issues.apache.org/jira/browse/FLINK-34419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl resolved FLINK-34419. --- Resolution: Fixed > flink-docker's .github/workflows/snapshot.yml doesn't support JDK 17 and 21 > --- > > Key: FLINK-34419 > URL: https://issues.apache.org/jira/browse/FLINK-34419 > Project: Flink > Issue Type: Technical Debt > Components: Build System / CI >Reporter: Matthias Pohl >Assignee: Muhammet Orazov >Priority: Major > Labels: pull-request-available, starter > > [.github/workflows/snapshot.yml|https://github.com/apache/flink-docker/blob/master/.github/workflows/snapshot.yml#L40] > needs to be updated: JDK 17 support was added in 1.18 (FLINK-15736). JDK 21 > support was added in 1.19 (FLINK-33163) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-34419) flink-docker's .github/workflows/snapshot.yml doesn't support JDK 17 and 21
[ https://issues.apache.org/jira/browse/FLINK-34419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831391#comment-17831391 ] Matthias Pohl edited comment on FLINK-34419 at 3/27/24 2:56 PM: master: 9e0041a2c9dace4bf3f32815e3e24e24385b179b dev-master: 1460077743b29e17edd0a2d7efd3897fa097988d dev-1.19: 67d7c46ed382a665e941f0cf1f1606d10f87dee5 dev-1.18: d93d911b015e535fc2b6f1426c3b36229ff3d02a was (Author: mapohl): master: 9e0041a2c9dace4bf3f32815e3e24e24385b179b dev-master: tba dev-1.19: tba dev-1.18: tba > flink-docker's .github/workflows/snapshot.yml doesn't support JDK 17 and 21 > --- > > Key: FLINK-34419 > URL: https://issues.apache.org/jira/browse/FLINK-34419 > Project: Flink > Issue Type: Technical Debt > Components: Build System / CI >Reporter: Matthias Pohl >Assignee: Muhammet Orazov >Priority: Major > Labels: pull-request-available, starter > > [.github/workflows/snapshot.yml|https://github.com/apache/flink-docker/blob/master/.github/workflows/snapshot.yml#L40] > needs to be updated: JDK 17 support was added in 1.18 (FLINK-15736). JDK 21 > support was added in 1.19 (FLINK-33163) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34419) flink-docker's .github/workflows/snapshot.yml doesn't support JDK 17 and 21
[ https://issues.apache.org/jira/browse/FLINK-34419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831391#comment-17831391 ] Matthias Pohl commented on FLINK-34419: --- master: 9e0041a2c9dace4bf3f32815e3e24e24385b179b dev-master: tba dev-1.19: tba dev-1.18: tba > flink-docker's .github/workflows/snapshot.yml doesn't support JDK 17 and 21 > --- > > Key: FLINK-34419 > URL: https://issues.apache.org/jira/browse/FLINK-34419 > Project: Flink > Issue Type: Technical Debt > Components: Build System / CI >Reporter: Matthias Pohl >Assignee: Muhammet Orazov >Priority: Major > Labels: pull-request-available, starter > > [.github/workflows/snapshot.yml|https://github.com/apache/flink-docker/blob/master/.github/workflows/snapshot.yml#L40] > needs to be updated: JDK 17 support was added in 1.18 (FLINK-15736). JDK 21 > support was added in 1.19 (FLINK-33163) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-34897) JobMasterServiceLeadershipRunnerTest#testJobMasterServiceLeadershipRunnerCloseWhenElectionServiceGrantLeaderShip needs to be enabled again
[ https://issues.apache.org/jira/browse/FLINK-34897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl resolved FLINK-34897. --- Fix Version/s: 1.18.2 1.20.0 1.19.1 Resolution: Fixed master: [0e70d89ad9f807a5816290e9808720e71bdad655|https://github.com/apache/flink/commit/0e70d89ad9f807a5816290e9808720e71bdad655] 1.19: [6b5c48ff53ddc6e75056a9050afded2ac44a413a|https://github.com/apache/flink/commit/6b5c48ff53ddc6e75056a9050afded2ac44a413a] 1.18: [a6aa569f5005041934a2e6398b6749584beeaabd|https://github.com/apache/flink/commit/a6aa569f5005041934a2e6398b6749584beeaabd] > JobMasterServiceLeadershipRunnerTest#testJobMasterServiceLeadershipRunnerCloseWhenElectionServiceGrantLeaderShip > needs to be enabled again > -- > > Key: FLINK-34897 > URL: https://issues.apache.org/jira/browse/FLINK-34897 > Project: Flink > Issue Type: Technical Debt > Components: Runtime / Coordination >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Major > Labels: pull-request-available > Fix For: 1.18.2, 1.20.0, 1.19.1 > > > While working on FLINK-34672 I noticed that > {{JobMasterServiceLeadershipRunnerTest#testJobMasterServiceLeadershipRunnerCloseWhenElectionServiceGrantLeaderShip}} > is disabled without a reason. > It looks like I disabled it accidentally as part of FLINK-31783. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34897) JobMasterServiceLeadershipRunnerTest#testJobMasterServiceLeadershipRunnerCloseWhenElectionServiceGrantLeaderShip needs to be enabled again
[ https://issues.apache.org/jira/browse/FLINK-34897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34897: -- Affects Version/s: (was: 1.17.2) > JobMasterServiceLeadershipRunnerTest#testJobMasterServiceLeadershipRunnerCloseWhenElectionServiceGrantLeaderShip > needs to be enabled again > -- > > Key: FLINK-34897 > URL: https://issues.apache.org/jira/browse/FLINK-34897 > Project: Flink > Issue Type: Technical Debt > Components: Runtime / Coordination >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Major > Labels: pull-request-available > > While working on FLINK-34672 I noticed that > {{JobMasterServiceLeadershipRunnerTest#testJobMasterServiceLeadershipRunnerCloseWhenElectionServiceGrantLeaderShip}} > is disabled without a reason. > It looks like I disabled it accidentally as part of FLINK-31783. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34551) Align retry mechanisms of FutureUtils
[ https://issues.apache.org/jira/browse/FLINK-34551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831232#comment-17831232 ] Matthias Pohl commented on FLINK-34551: --- The intention of the ticket is to remove the code redundancy, yes. I'm gonna assign the issue to you. > Align retry mechanisms of FutureUtils > - > > Key: FLINK-34551 > URL: https://issues.apache.org/jira/browse/FLINK-34551 > Project: Flink > Issue Type: Technical Debt > Components: API / Core >Affects Versions: 1.20.0 >Reporter: Matthias Pohl >Priority: Major > > The retry mechanisms of FutureUtils include quite a bit of redundant code > which makes it hard to understand and to extend. The logic should be aligned > properly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-34551) Align retry mechanisms of FutureUtils
[ https://issues.apache.org/jira/browse/FLINK-34551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl reassigned FLINK-34551: - Assignee: Kumar Mallikarjuna > Align retry mechanisms of FutureUtils > - > > Key: FLINK-34551 > URL: https://issues.apache.org/jira/browse/FLINK-34551 > Project: Flink > Issue Type: Technical Debt > Components: API / Core >Affects Versions: 1.20.0 >Reporter: Matthias Pohl >Assignee: Kumar Mallikarjuna >Priority: Major > > The retry mechanisms of FutureUtils include quite a bit of redundant code > which makes it hard to understand and to extend. The logic should be aligned > properly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34940) LeaderContender implementations handle invalid state
Matthias Pohl created FLINK-34940: - Summary: LeaderContender implementations handle invalid state Key: FLINK-34940 URL: https://issues.apache.org/jira/browse/FLINK-34940 Project: Flink Issue Type: Technical Debt Components: Runtime / Coordination Reporter: Matthias Pohl Currently, LeaderContender implementations (e.g. see [ResourceManagerServiceImplTest#grantLeadership_withExistingLeader_waitTerminationOfExistingLeader|https://github.com/apache/flink/blob/master/flink-runtime/src/test/java/org/apache/flink/runtime/resourcemanager/ResourceManagerServiceImplTest.java#L219]) allow the handling of leader events of the same type happening after each other which shouldn't be the case. Two subsequent leadership grants indicate that the leading instance which received the leadership grant again missed the leadership revocation event causing an invalid state of the overall deployment (i.e. split brain scenario). We should fail fatally in these scenarios rather than handling them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34937) Apache Infra GHA policy update
[ https://issues.apache.org/jira/browse/FLINK-34937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830876#comment-17830876 ] Matthias Pohl commented on FLINK-34937: --- I updated the link to refer to a publicly available resource (y) I haven't gone through the policy in detail. We might have to get back to infra if things are unclear. For this, it might be worth it to respond in the [infra ML thread|https://lists.apache.org/thread/6qw21x44q88rc3mhkn42jgjjw94rsvb1] (for which you would have to subscribe) > Apache Infra GHA policy update > -- > > Key: FLINK-34937 > URL: https://issues.apache.org/jira/browse/FLINK-34937 > Project: Flink > Issue Type: Sub-task > Components: Build System / CI >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Major > > There is a policy update [announced in the infra > ML|https://www.mail-archive.com/jdo-dev@db.apache.org/msg13638.html] which > asked Apache projects to limit the number of runners per job. Additionally, > the [GHA policy|https://infra.apache.org/github-actions-policy.html] is > referenced which I wasn't aware of when working on the action workflow. > This issue is about applying the policy to the Flink GHA workflows. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34937) Apache Infra GHA policy update
[ https://issues.apache.org/jira/browse/FLINK-34937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34937: -- Description: There is a policy update [announced in the infra ML|https://www.mail-archive.com/jdo-dev@db.apache.org/msg13638.html] which asked Apache projects to limit the number of runners per job. Additionally, the [GHA policy|https://infra.apache.org/github-actions-policy.html] is referenced which I wasn't aware of when working on the action workflow. This issue is about applying the policy to the Flink GHA workflows. was: There is a policy update [announced in the infra ML|https://lists.apache.org/thread/6qw21x44q88rc3mhkn42jgjjw94rsvb1] which asked Apache projects to limit the number of runners per job. Additionally, the [GHA policy|https://infra.apache.org/github-actions-policy.html] is referenced which I wasn't aware of when working on the action workflow. This issue is about applying the policy to the Flink GHA workflows. > Apache Infra GHA policy update > -- > > Key: FLINK-34937 > URL: https://issues.apache.org/jira/browse/FLINK-34937 > Project: Flink > Issue Type: Sub-task > Components: Build System / CI >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Major > > There is a policy update [announced in the infra > ML|https://www.mail-archive.com/jdo-dev@db.apache.org/msg13638.html] which > asked Apache projects to limit the number of runners per job. Additionally, > the [GHA policy|https://infra.apache.org/github-actions-policy.html] is > referenced which I wasn't aware of when working on the action workflow. > This issue is about applying the policy to the Flink GHA workflows. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34939) Harden TestingLeaderElection
Matthias Pohl created FLINK-34939: - Summary: Harden TestingLeaderElection Key: FLINK-34939 URL: https://issues.apache.org/jira/browse/FLINK-34939 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl The {{TestingLeaderElection}} implementation does not follow the interface contract of {{LeaderElection}} in all of its facets (e.g. leadership acquire and revocation events should be alternating). This issue is about hardening {{LeaderElection}} contract in the test implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34937) Apache Infra GHA policy update
[ https://issues.apache.org/jira/browse/FLINK-34937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34937: -- Parent: FLINK-33901 Issue Type: Sub-task (was: Bug) > Apache Infra GHA policy update > -- > > Key: FLINK-34937 > URL: https://issues.apache.org/jira/browse/FLINK-34937 > Project: Flink > Issue Type: Sub-task > Components: Build System / CI >Affects Versions: 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Priority: Major > > There is a policy update [announced in the infra > ML|https://lists.apache.org/thread/6qw21x44q88rc3mhkn42jgjjw94rsvb1] which > asked Apache projects to limit the number of runners per job. Additionally, > the [GHA policy|https://infra.apache.org/github-actions-policy.html] is > referenced which I wasn't aware of when working on the action workflow. > This issue is about applying the policy to the Flink GHA workflows. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34937) Apache Infra GHA policy update
Matthias Pohl created FLINK-34937: - Summary: Apache Infra GHA policy update Key: FLINK-34937 URL: https://issues.apache.org/jira/browse/FLINK-34937 Project: Flink Issue Type: Bug Components: Build System / CI Affects Versions: 1.18.1, 1.19.0, 1.20.0 Reporter: Matthias Pohl There is a policy update [announced in the infra ML|https://lists.apache.org/thread/6qw21x44q88rc3mhkn42jgjjw94rsvb1] which asked Apache projects to limit the number of runners per job. Additionally, the [GHA policy|https://infra.apache.org/github-actions-policy.html] is referenced which I wasn't aware of when working on the action workflow. This issue is about applying the policy to the Flink GHA workflows. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34643) JobIDLoggingITCase failed
[ https://issues.apache.org/jira/browse/FLINK-34643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830812#comment-17830812 ] Matthias Pohl commented on FLINK-34643: --- Should we try to reproduce the test failure in a PR by modifying the CI scripts (i.e. executing the test in a loop)? That way we could disable the test in {{master}} for now. > JobIDLoggingITCase failed > - > > Key: FLINK-34643 > URL: https://issues.apache.org/jira/browse/FLINK-34643 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.20.0 >Reporter: Matthias Pohl >Assignee: Roman Khachatryan >Priority: Major > Labels: pull-request-available, test-stability > Fix For: 1.20.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=7897 > {code} > Mar 09 01:24:23 01:24:23.498 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 4.209 s <<< FAILURE! -- in > org.apache.flink.test.misc.JobIDLoggingITCase > Mar 09 01:24:23 01:24:23.498 [ERROR] > org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(ClusterClient) > -- Time elapsed: 1.459 s <<< ERROR! > Mar 09 01:24:23 java.lang.IllegalStateException: Too few log events recorded > for org.apache.flink.runtime.jobmaster.JobMaster (12) - this must be a bug in > the test code > Mar 09 01:24:23 at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:215) > Mar 09 01:24:23 at > org.apache.flink.test.misc.JobIDLoggingITCase.assertJobIDPresent(JobIDLoggingITCase.java:148) > Mar 09 01:24:23 at > org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(JobIDLoggingITCase.java:132) > Mar 09 01:24:23 at java.lang.reflect.Method.invoke(Method.java:498) > Mar 09 01:24:23 at > java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) > Mar 09 01:24:23 > {code} > The other test failures of this build were also caused by the same test: > * > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=2c3cbe13-dee0-5837-cf47-3053da9a8a78=b78d9d30-509a-5cea-1fef-db7abaa325ae=8349 > * > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=8209 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-33900) Multiple failures in WindowRankITCase due to NoResourceAvailableException
[ https://issues.apache.org/jira/browse/FLINK-33900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl closed FLINK-33900. - Resolution: Duplicate I checked for the failures where the logs were not removed yet that it's actually a duplicated of FLINK-34227. Closing this one in favor of FLINK-34227. > Multiple failures in WindowRankITCase due to NoResourceAvailableException > - > > Key: FLINK-33900 > URL: https://issues.apache.org/jira/browse/FLINK-33900 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.18.0, 1.19.0 >Reporter: Matthias Pohl >Priority: Major > Labels: github-actions, test-stability > > [https://github.com/XComp/flink/actions/runs/7244405295/job/19733011527#step:12:14989] > There are multiple tests in {{WindowRankITCase}} that fail due to a > {{NoResourceAvailableException}} supposedly: > {code:java} > [...] > Error: 09:19:33 09:19:32.966 [ERROR] > WindowRankITCase.testTumbleWindowTVFWithOffset Time elapsed: 300.072 s <<< > FAILURE! > 14558Dec 18 09:19:33 org.opentest4j.MultipleFailuresError: > 14559Dec 18 09:19:33 Multiple Failures (2 failures) > 14560Dec 18 09:19:33 org.apache.flink.runtime.client.JobExecutionException: > Job execution failed. > 14561Dec 18 09:19:33 java.lang.AssertionError: > 14562Dec 18 09:19:33 at > org.junit.vintage.engine.execution.TestRun.getStoredResultOrSuccessful(TestRun.java:200) > 14563Dec 18 09:19:33 at > org.junit.vintage.engine.execution.RunListenerAdapter.fireExecutionFinished(RunListenerAdapter.java:248) > 14564Dec 18 09:19:33 at > org.junit.vintage.engine.execution.RunListenerAdapter.testFinished(RunListenerAdapter.java:214) > 14565Dec 18 09:19:33 at > org.junit.vintage.engine.execution.RunListenerAdapter.testFinished(RunListenerAdapter.java:88) > 14566Dec 18 09:19:33 at > org.junit.runner.notification.SynchronizedRunListener.testFinished(SynchronizedRunListener.java:87) > 14567Dec 18 09:19:33 at > org.junit.runner.notification.RunNotifier$9.notifyListener(RunNotifier.java:225) > 14568Dec 18 09:19:33 at > org.junit.runner.notification.RunNotifier$SafeNotifier.run(RunNotifier.java:72) > 14569Dec 18 09:19:33 at > org.junit.runner.notification.RunNotifier.fireTestFinished(RunNotifier.java:222) > 14570Dec 18 09:19:33 at > org.junit.internal.runners.model.EachTestNotifier.fireTestFinished(EachTestNotifier.java:38) > 14571Dec 18 09:19:33 at > org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:372) > 14572Dec 18 09:19:33 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > 14573Dec 18 09:19:33 at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > 14574Dec 18 09:19:33 at > org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > 14575Dec 18 09:19:33 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > 14576Dec 18 09:19:33 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > 14577Dec 18 09:19:33 at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > 14578Dec 18 09:19:33 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > 14579Dec 18 09:19:33 at > org.junit.runners.ParentRunner.run(ParentRunner.java:413) > 14580Dec 18 09:19:33 at org.junit.runners.Suite.runChild(Suite.java:128) > 14581Dec 18 09:19:33 at org.junit.runners.Suite.runChild(Suite.java:27) > 14582Dec 18 09:19:33 at > org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > 14583Dec 18 09:19:33 at > org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > 14584Dec 18 09:19:33 at > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > 14585Dec 18 09:19:33 at > org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > 14586Dec 18 09:19:33 at > org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > 14587Dec 18 09:19:33 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > 14588Dec 18 09:19:33 at > org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) > 14589Dec 18 09:19:33 at org.junit.rules.RunRules.evaluate(RunRules.java:20) > 14590Dec 18 09:19:33 at > org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > 14591Dec 18 09:19:33 at > org.junit.runners.ParentRunner.run(ParentRunner.java:413) > 14592Dec 18 09:19:33 at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > 14593Dec 18 09:19:33 at org.junit.runner.JUnitCore.run(JUnitCore.java:115) > 14594Dec 18 09:19:33 at > org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:42) > 14595Dec 18 09:19:33 at > org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:80) > 14596Dec 18 09:19:33 at >
[jira] [Commented] (FLINK-34227) Job doesn't disconnect from ResourceManager
[ https://issues.apache.org/jira/browse/FLINK-34227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830581#comment-17830581 ] Matthias Pohl commented on FLINK-34227: --- https://github.com/apache/flink/actions/runs/8414062328/job/23037443503#step:10:12562 > Job doesn't disconnect from ResourceManager > --- > > Key: FLINK-34227 > URL: https://issues.apache.org/jira/browse/FLINK-34227 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.19.0, 1.18.1 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Critical > Labels: github-actions, pull-request-available, test-stability > Attachments: FLINK-34227.7e7d69daebb438b8d03b7392c9c55115.log, > FLINK-34227.log > > > https://github.com/XComp/flink/actions/runs/7634987973/job/20800205972#step:10:14557 > {code} > [...] > "main" #1 prio=5 os_prio=0 tid=0x7f4b7000 nid=0x24ec0 waiting on > condition [0x7fccce1eb000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xbdd52618> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2131) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2099) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2077) > at > org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:876) > at > org.apache.flink.table.planner.runtime.stream.sql.WindowDistinctAggregateITCase.testHopWindow_Cube(WindowDistinctAggregateITCase.scala:550) > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34273) git fetch fails
[ https://issues.apache.org/jira/browse/FLINK-34273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830565#comment-17830565 ] Matthias Pohl commented on FLINK-34273: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58519=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=bc77b88f-20e6-5fb3-ac3b-0b6efcca48c5=406 > git fetch fails > --- > > Key: FLINK-34273 > URL: https://issues.apache.org/jira/browse/FLINK-34273 > Project: Flink > Issue Type: Bug > Components: Build System / CI, Test Infrastructure >Affects Versions: 1.19.0, 1.18.1 >Reporter: Matthias Pohl >Priority: Major > Labels: test-stability > > We've seen multiple {{git fetch}} failures. I assume this to be an > infrastructure issue. This Jira issue is for documentation purposes. > {code:java} > error: RPC failed; curl 18 transfer closed with outstanding read data > remaining > error: 5211 bytes of body are still expected > fetch-pack: unexpected disconnect while reading sideband packet > fatal: early EOF > fatal: fetch-pack: invalid index-pack output {code} > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=57080=logs=0e7be18f-84f2-53f0-a32d-4a5e4a174679=5d6dc3d3-393d-5111-3a40-c6a5a36202e6=667 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-30719) flink-runtime-web failed due to a corrupted nodejs dependency
[ https://issues.apache.org/jira/browse/FLINK-30719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830562#comment-17830562 ] Matthias Pohl commented on FLINK-30719: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58502=logs=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5=54421a62-0c80-5aad-3319-094ff69180bb=9714 Slightly different error but still worth it mentioning: {code} 13:36:43.413 [ERROR] Failed to execute goal com.github.eirslett:frontend-maven-plugin:1.11.0:install-node-and-npm (install node and npm) on project flink-runtime-web: Could not download Node.js: Got error code 525 from the server. -> [Help 1] {code} > flink-runtime-web failed due to a corrupted nodejs dependency > - > > Key: FLINK-30719 > URL: https://issues.apache.org/jira/browse/FLINK-30719 > Project: Flink > Issue Type: Bug > Components: Runtime / Web Frontend, Test Infrastructure, Tests >Affects Versions: 1.16.0, 1.17.0, 1.18.0 >Reporter: Matthias Pohl >Assignee: Sergey Nuyanzin >Priority: Critical > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=44954=logs=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5=54421a62-0c80-5aad-3319-094ff69180bb=12550 > The build failed due to a corrupted nodejs dependency: > {code} > [ERROR] The archive file > /__w/1/.m2/repository/com/github/eirslett/node/16.13.2/node-16.13.2-linux-x64.tar.gz > is corrupted and will be deleted. Please try the build again. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-21450) Add local recovery support to adaptive scheduler
[ https://issues.apache.org/jira/browse/FLINK-21450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830558#comment-17830558 ] Matthias Pohl commented on FLINK-21450: --- Enabling the tests for AdaptiveScheduler (see FLINK-34409): * master ** [8f06fb472ba6a10f0829aecf1eedee26e924aa6d|https://github.com/apache/flink/commit/8f06fb472ba6a10f0829aecf1eedee26e924aa6d] * 1.19 ** [00492630baa5cf041ea2cce2a3560f3e713bf57a|https://github.com/apache/flink/commit/00492630baa5cf041ea2cce2a3560f3e713bf57a] * 1.18 ** [f5c243097ac9fae29c3365a2361b7b0c6be3b3ee|https://github.com/apache/flink/commit/f5c243097ac9fae29c3365a2361b7b0c6be3b3ee] > Add local recovery support to adaptive scheduler > > > Key: FLINK-21450 > URL: https://issues.apache.org/jira/browse/FLINK-21450 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Reporter: Robert Metzger >Assignee: Roman Khachatryan >Priority: Major > Labels: auto-deprioritized-major, auto-deprioritized-minor, > auto-unassigned, pull-request-available > Fix For: 1.18.0 > > > local recovery means that, on a failure, we are able to re-use the state in a > taskmanager, instead of loading it again from distributed storage (which > means the scheduler needs to know where which state is located, and schedule > tasks accordingly). > Adaptive Scheduler is currently not respecting the location of state, so > failures require the re-loading of state from the distributed storage. > Adding this feature will allow us to enable the {{Local recovery and sticky > scheduling end-to-end test}} for adaptive scheduler again. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-21535) UnalignedCheckpointITCase.execute failed with "OutOfMemoryError: Java heap space"
[ https://issues.apache.org/jira/browse/FLINK-21535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830557#comment-17830557 ] Matthias Pohl commented on FLINK-21535: --- Enabling the tests for the AdaptiveScheduler (see FLINK-34409): * master ** [96142404c143f2094af262b8ac02a8b06aa773d5|https://github.com/apache/flink/commit/96142404c143f2094af262b8ac02a8b06aa773d5] * 1.19 ** [7d107966dbe7e38e43680fabf3ffdfeaa71e8d3c|https://github.com/apache/flink/commit/7d107966dbe7e38e43680fabf3ffdfeaa71e8d3c] * 1.18 ** [836b332b2d100e21b1d0008257a009d9ec09e13a|https://github.com/apache/flink/commit/836b332b2d100e21b1d0008257a009d9ec09e13a] > UnalignedCheckpointITCase.execute failed with "OutOfMemoryError: Java heap > space" > - > > Key: FLINK-21535 > URL: https://issues.apache.org/jira/browse/FLINK-21535 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.13.0 >Reporter: Dawid Wysakowicz >Assignee: Arvid Heise >Priority: Major > Labels: pull-request-available, test-stability > Fix For: 1.13.0, 1.12.3 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=13866=logs=39d5b1d5-3b41-54dc-6458-1e2ddd1cdcf3=a99e99c7-21cd-5a1f-7274-585e62b72f56 > {code} > 2021-02-27T02:11:41.5659201Z > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > 2021-02-27T02:11:41.5659947Z at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144) > 2021-02-27T02:11:41.5660794Z at > org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:137) > 2021-02-27T02:11:41.5661618Z at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616) > 2021-02-27T02:11:41.5662356Z at > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) > 2021-02-27T02:11:41.5663104Z at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > 2021-02-27T02:11:41.5664016Z at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) > 2021-02-27T02:11:41.5664817Z at > org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:237) > 2021-02-27T02:11:41.5665638Z at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) > 2021-02-27T02:11:41.5666405Z at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) > 2021-02-27T02:11:41.5667609Z at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > 2021-02-27T02:11:41.5668358Z at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) > 2021-02-27T02:11:41.5669218Z at > org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:1066) > 2021-02-27T02:11:41.5669928Z at > akka.dispatch.OnComplete.internal(Future.scala:264) > 2021-02-27T02:11:41.5670540Z at > akka.dispatch.OnComplete.internal(Future.scala:261) > 2021-02-27T02:11:41.5671268Z at > akka.dispatch.japi$CallbackBridge.apply(Future.scala:191) > 2021-02-27T02:11:41.5671881Z at > akka.dispatch.japi$CallbackBridge.apply(Future.scala:188) > 2021-02-27T02:11:41.5672512Z at > scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) > 2021-02-27T02:11:41.5673219Z at > org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:73) > 2021-02-27T02:11:41.5674085Z at > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44) > 2021-02-27T02:11:41.5674794Z at > scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252) > 2021-02-27T02:11:41.5675466Z at > akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572) > 2021-02-27T02:11:41.5676181Z at > akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22) > 2021-02-27T02:11:41.5676977Z at > akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21) > 2021-02-27T02:11:41.5677717Z at > scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436) > 2021-02-27T02:11:41.5678409Z at > scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435) > 2021-02-27T02:11:41.5679071Z at > scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) > 2021-02-27T02:11:41.5679776Z at > akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) > 2021-02-27T02:11:41.5680576Z at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91) > 2021-02-27T02:11:41.5681383Z at >
[jira] [Commented] (FLINK-21400) Attempt numbers are not maintained across restarts
[ https://issues.apache.org/jira/browse/FLINK-21400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830556#comment-17830556 ] Matthias Pohl commented on FLINK-21400: --- Enabling the tests for AdaptiveScheduler (see FLINK-34409): * master ** [a1d17ccf0eec7dd614146e22832037cadd7abe5c|https://github.com/apache/flink/commit/a1d17ccf0eec7dd614146e22832037cadd7abe5c] * 1.19 ** [4fc36e9abaa8cc2d0e01c1e389b449f563b87e8e|https://github.com/apache/flink/commit/4fc36e9abaa8cc2d0e01c1e389b449f563b87e8e] * 1.18 ** [8f6890fbd757f3d3c9c891ea9139a1e5ac3412a2|https://github.com/apache/flink/commit/8f6890fbd757f3d3c9c891ea9139a1e5ac3412a2] > Attempt numbers are not maintained across restarts > -- > > Key: FLINK-21400 > URL: https://issues.apache.org/jira/browse/FLINK-21400 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Critical > Labels: pull-request-available > Fix For: 1.13.0 > > > The DeclarativeScheduler discards the ExecutionGraph on each restart attempt, > as a result of which the attempt number remains 0. > Various tests use the attempt number to determine whether an exception should > be thrown, and thus continue to throw exceptions on each restart. > Affected tests: > UnalignedCheckpointTestBase > UnalignedCheckpointITCase > ProcessingTimeWindowCheckpointingITCase > LocalRecoveryITCase > EventTimeWindowCheckpointingITCase > EventTimeAllWindowCheckpointingITCase > FileSinkITBase#testFileSink -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-34409) Increase test coverage for AdaptiveScheduler
[ https://issues.apache.org/jira/browse/FLINK-34409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl resolved FLINK-34409. --- Fix Version/s: 1.18.2 1.20.0 1.19.1 Resolution: Fixed master: [1aa35b95975560da6afb7fcf0ad80f0a25c5d183|https://github.com/apache/flink/commit/1aa35b95975560da6afb7fcf0ad80f0a25c5d183] 1.19: [f82ff7c656d3eeb3e82b456d284639e59624a849|https://github.com/apache/flink/commit/f82ff7c656d3eeb3e82b456d284639e59624a849] 1.18: [f2a6ff5a97bf27d68be1188c05158e18df810549|https://github.com/apache/flink/commit/f2a6ff5a97bf27d68be1188c05158e18df810549] > Increase test coverage for AdaptiveScheduler > > > Key: FLINK-34409 > URL: https://issues.apache.org/jira/browse/FLINK-34409 > Project: Flink > Issue Type: Technical Debt > Components: Runtime / Coordination >Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Major > Labels: pull-request-available > Fix For: 1.18.2, 1.20.0, 1.19.1 > > > There are still several tests disabled for the {{AdaptiveScheduler}} which we > can enable now. All the issues seem to have been fixed. > We can even remove the annotation {{@FailsWithAdaptiveScheduler}} now. It's > not needed anymore. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-34933) JobMasterServiceLeadershipRunnerTest#testResultFutureCompletionOfOutdatedLeaderIsIgnored isn't implemented properly
[ https://issues.apache.org/jira/browse/FLINK-34933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl reassigned FLINK-34933: - Assignee: Matthias Pohl > JobMasterServiceLeadershipRunnerTest#testResultFutureCompletionOfOutdatedLeaderIsIgnored > isn't implemented properly > --- > > Key: FLINK-34933 > URL: https://issues.apache.org/jira/browse/FLINK-34933 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Major > > {{testResultFutureCompletionOfOutdatedLeaderIsIgnored}} doesn't test the > desired behavior: The {{TestingJobMasterService#closeAsync()}} callback > throws an {{UnsupportedOperationException}} by default which prevents the > test from properly finalizing the leadership revocation. > The test is still passing because the test checks implicitly for this error. > Instead, we should verify that the runner's resultFuture doesn't complete > until the runner is closed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34933) JobMasterServiceLeadershipRunnerTest#testResultFutureCompletionOfOutdatedLeaderIsIgnored isn't implemented properly
Matthias Pohl created FLINK-34933: - Summary: JobMasterServiceLeadershipRunnerTest#testResultFutureCompletionOfOutdatedLeaderIsIgnored isn't implemented properly Key: FLINK-34933 URL: https://issues.apache.org/jira/browse/FLINK-34933 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.18.1, 1.19.0, 1.17.2, 1.20.0 Reporter: Matthias Pohl {{testResultFutureCompletionOfOutdatedLeaderIsIgnored}} doesn't test the desired behavior: The {{TestingJobMasterService#closeAsync()}} callback throws an {{UnsupportedOperationException}} by default which prevents the test from properly finalizing the leadership revocation. The test is still passing because the test checks implicitly for this error. Instead, we should verify that the runner's resultFuture doesn't complete until the runner is closed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-33816) SourceStreamTaskTest.testTriggeringStopWithSavepointWithDrain failed due async checkpoint triggering not being completed
[ https://issues.apache.org/jira/browse/FLINK-33816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829918#comment-17829918 ] Matthias Pohl edited comment on FLINK-33816 at 3/22/24 3:51 PM: I created the [1.19 backport|https://github.com/apache/flink/pull/24556]. Is this also affecting 1.18? Based on the git history I would assume so. was (Author: mapohl): I created the 1.19 backport. Is this also affecting 1.18? Based on the git history I would assume so. > SourceStreamTaskTest.testTriggeringStopWithSavepointWithDrain failed due > async checkpoint triggering not being completed > - > > Key: FLINK-33816 > URL: https://issues.apache.org/jira/browse/FLINK-33816 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Checkpointing, Runtime / Coordination >Affects Versions: 1.19.0 >Reporter: Matthias Pohl >Assignee: jiabao.sun >Priority: Major > Labels: github-actions, pull-request-available, test-stability > Fix For: 1.20.0 > > Attachments: screenshot-1.png > > > [https://github.com/XComp/flink/actions/runs/7182604625/job/19559947894#step:12:9430] > {code:java} > rror: 14:39:01 14:39:01.930 [ERROR] Tests run: 16, Failures: 1, Errors: 0, > Skipped: 0, Time elapsed: 1.878 s <<< FAILURE! - in > org.apache.flink.streaming.runtime.tasks.SourceStreamTaskTest > 9426Error: 14:39:01 14:39:01.930 [ERROR] > org.apache.flink.streaming.runtime.tasks.SourceStreamTaskTest.testTriggeringStopWithSavepointWithDrain > Time elapsed: 0.034 s <<< FAILURE! > 9427Dec 12 14:39:01 org.opentest4j.AssertionFailedError: > 9428Dec 12 14:39:01 > 9429Dec 12 14:39:01 Expecting value to be true but was false > 9430Dec 12 14:39:01 at > java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62) > 9431Dec 12 14:39:01 at > java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502) > 9432Dec 12 14:39:01 at > org.apache.flink.streaming.runtime.tasks.SourceStreamTaskTest.testTriggeringStopWithSavepointWithDrain(SourceStreamTaskTest.java:710) > 9433Dec 12 14:39:01 at > java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > 9434Dec 12 14:39:01 at > java.base/java.lang.reflect.Method.invoke(Method.java:580) > [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33816) SourceStreamTaskTest.testTriggeringStopWithSavepointWithDrain failed due async checkpoint triggering not being completed
[ https://issues.apache.org/jira/browse/FLINK-33816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829918#comment-17829918 ] Matthias Pohl commented on FLINK-33816: --- I created the 1.19 backport. Is this also affecting 1.18? Based on the git history I would assume so. > SourceStreamTaskTest.testTriggeringStopWithSavepointWithDrain failed due > async checkpoint triggering not being completed > - > > Key: FLINK-33816 > URL: https://issues.apache.org/jira/browse/FLINK-33816 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Checkpointing, Runtime / Coordination >Affects Versions: 1.19.0 >Reporter: Matthias Pohl >Assignee: jiabao.sun >Priority: Major > Labels: github-actions, pull-request-available, test-stability > Fix For: 1.20.0 > > Attachments: screenshot-1.png > > > [https://github.com/XComp/flink/actions/runs/7182604625/job/19559947894#step:12:9430] > {code:java} > rror: 14:39:01 14:39:01.930 [ERROR] Tests run: 16, Failures: 1, Errors: 0, > Skipped: 0, Time elapsed: 1.878 s <<< FAILURE! - in > org.apache.flink.streaming.runtime.tasks.SourceStreamTaskTest > 9426Error: 14:39:01 14:39:01.930 [ERROR] > org.apache.flink.streaming.runtime.tasks.SourceStreamTaskTest.testTriggeringStopWithSavepointWithDrain > Time elapsed: 0.034 s <<< FAILURE! > 9427Dec 12 14:39:01 org.opentest4j.AssertionFailedError: > 9428Dec 12 14:39:01 > 9429Dec 12 14:39:01 Expecting value to be true but was false > 9430Dec 12 14:39:01 at > java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62) > 9431Dec 12 14:39:01 at > java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502) > 9432Dec 12 14:39:01 at > org.apache.flink.streaming.runtime.tasks.SourceStreamTaskTest.testTriggeringStopWithSavepointWithDrain(SourceStreamTaskTest.java:710) > 9433Dec 12 14:39:01 at > java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > 9434Dec 12 14:39:01 at > java.base/java.lang.reflect.Method.invoke(Method.java:580) > [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33816) SourceStreamTaskTest.testTriggeringStopWithSavepointWithDrain failed due async checkpoint triggering not being completed
[ https://issues.apache.org/jira/browse/FLINK-33816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-33816: -- Fix Version/s: 1.20.0 (was: 2.0.0) > SourceStreamTaskTest.testTriggeringStopWithSavepointWithDrain failed due > async checkpoint triggering not being completed > - > > Key: FLINK-33816 > URL: https://issues.apache.org/jira/browse/FLINK-33816 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Checkpointing, Runtime / Coordination >Affects Versions: 1.19.0 >Reporter: Matthias Pohl >Assignee: jiabao.sun >Priority: Major > Labels: github-actions, pull-request-available, test-stability > Fix For: 1.20.0 > > Attachments: screenshot-1.png > > > [https://github.com/XComp/flink/actions/runs/7182604625/job/19559947894#step:12:9430] > {code:java} > rror: 14:39:01 14:39:01.930 [ERROR] Tests run: 16, Failures: 1, Errors: 0, > Skipped: 0, Time elapsed: 1.878 s <<< FAILURE! - in > org.apache.flink.streaming.runtime.tasks.SourceStreamTaskTest > 9426Error: 14:39:01 14:39:01.930 [ERROR] > org.apache.flink.streaming.runtime.tasks.SourceStreamTaskTest.testTriggeringStopWithSavepointWithDrain > Time elapsed: 0.034 s <<< FAILURE! > 9427Dec 12 14:39:01 org.opentest4j.AssertionFailedError: > 9428Dec 12 14:39:01 > 9429Dec 12 14:39:01 Expecting value to be true but was false > 9430Dec 12 14:39:01 at > java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62) > 9431Dec 12 14:39:01 at > java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502) > 9432Dec 12 14:39:01 at > org.apache.flink.streaming.runtime.tasks.SourceStreamTaskTest.testTriggeringStopWithSavepointWithDrain(SourceStreamTaskTest.java:710) > 9433Dec 12 14:39:01 at > java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > 9434Dec 12 14:39:01 at > java.base/java.lang.reflect.Method.invoke(Method.java:580) > [...] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-34643) JobIDLoggingITCase failed
[ https://issues.apache.org/jira/browse/FLINK-34643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829876#comment-17829876 ] Matthias Pohl edited comment on FLINK-34643 at 3/22/24 3:44 PM: * [https://github.com/apache/flink/actions/runs/8375475096/job/22933386950#step:10:7849] * [https://github.com/apache/flink/actions/runs/8384698540/job/22962603273#step:10:8296] * https://github.com/apache/flink/actions/runs/8384423503/job/22961956846#step:10:7958 was (Author: ryanskraba): * [https://github.com/apache/flink/actions/runs/8375475096/job/22933386950#step:10:7849] * [https://github.com/apache/flink/actions/runs/8384698540/job/22962603273#step:10:8296] * https://github.com/apache/flink/actions/runs/8375475096/job/22933386950#step:10:7849 > JobIDLoggingITCase failed > - > > Key: FLINK-34643 > URL: https://issues.apache.org/jira/browse/FLINK-34643 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.20.0 >Reporter: Matthias Pohl >Assignee: Roman Khachatryan >Priority: Major > Labels: pull-request-available, test-stability > Fix For: 1.20.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=7897 > {code} > Mar 09 01:24:23 01:24:23.498 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 4.209 s <<< FAILURE! -- in > org.apache.flink.test.misc.JobIDLoggingITCase > Mar 09 01:24:23 01:24:23.498 [ERROR] > org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(ClusterClient) > -- Time elapsed: 1.459 s <<< ERROR! > Mar 09 01:24:23 java.lang.IllegalStateException: Too few log events recorded > for org.apache.flink.runtime.jobmaster.JobMaster (12) - this must be a bug in > the test code > Mar 09 01:24:23 at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:215) > Mar 09 01:24:23 at > org.apache.flink.test.misc.JobIDLoggingITCase.assertJobIDPresent(JobIDLoggingITCase.java:148) > Mar 09 01:24:23 at > org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(JobIDLoggingITCase.java:132) > Mar 09 01:24:23 at java.lang.reflect.Method.invoke(Method.java:498) > Mar 09 01:24:23 at > java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) > Mar 09 01:24:23 > {code} > The other test failures of this build were also caused by the same test: > * > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=2c3cbe13-dee0-5837-cf47-3053da9a8a78=b78d9d30-509a-5cea-1fef-db7abaa325ae=8349 > * > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=8209 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-18476) PythonEnvUtilsTest#testStartPythonProcess fails
[ https://issues.apache.org/jira/browse/FLINK-18476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-18476: -- Affects Version/s: 1.20.0 > PythonEnvUtilsTest#testStartPythonProcess fails > --- > > Key: FLINK-18476 > URL: https://issues.apache.org/jira/browse/FLINK-18476 > Project: Flink > Issue Type: Bug > Components: API / Python, Tests >Affects Versions: 1.11.0, 1.15.3, 1.18.0, 1.19.0, 1.20.0 >Reporter: Dawid Wysakowicz >Priority: Major > Labels: auto-deprioritized-major, auto-deprioritized-minor, > test-stability > > The > {{org.apache.flink.client.python.PythonEnvUtilsTest#testStartPythonProcess}} > failed in my local environment as it assumes the environment has > {{/usr/bin/python}}. > I don't know exactly how did I get python in Ubuntu 20.04, but I have only > alias for {{python = python3}}. Therefore the tests fails. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34919) WebMonitorEndpointTest.cleansUpExpiredExecutionGraphs fails starting REST server
[ https://issues.apache.org/jira/browse/FLINK-34919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34919: -- Component/s: Runtime / Coordination > WebMonitorEndpointTest.cleansUpExpiredExecutionGraphs fails starting REST > server > > > Key: FLINK-34919 > URL: https://issues.apache.org/jira/browse/FLINK-34919 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.20.0 >Reporter: Ryan Skraba >Priority: Critical > Labels: test-stability > > [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58482=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef=8641] > {code:java} > Mar 22 04:12:50 04:12:50.260 [INFO] Running > org.apache.flink.runtime.webmonitor.WebMonitorEndpointTest > Mar 22 04:12:50 04:12:50.609 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 0.318 s <<< FAILURE! -- in > org.apache.flink.runtime.webmonitor.WebMonitorEndpointTest > Mar 22 04:12:50 04:12:50.609 [ERROR] > org.apache.flink.runtime.webmonitor.WebMonitorEndpointTest.cleansUpExpiredExecutionGraphs > -- Time elapsed: 0.303 s <<< ERROR! > Mar 22 04:12:50 java.net.BindException: Could not start rest endpoint on any > port in port range 8081 > Mar 22 04:12:50 at > org.apache.flink.runtime.rest.RestServerEndpoint.start(RestServerEndpoint.java:286) > Mar 22 04:12:50 at > org.apache.flink.runtime.webmonitor.WebMonitorEndpointTest.cleansUpExpiredExecutionGraphs(WebMonitorEndpointTest.java:69) > Mar 22 04:12:50 at java.lang.reflect.Method.invoke(Method.java:498) > Mar 22 04:12:50 at > java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) > Mar 22 04:12:50 at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > Mar 22 04:12:50 at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > Mar 22 04:12:50 at > java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) > Mar 22 04:12:50 at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) > Mar 22 04:12:50 {code} > This was noted as a symptom of FLINK-22980, but doesn't have the same failure. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34919) WebMonitorEndpointTest.cleansUpExpiredExecutionGraphs fails starting REST server
[ https://issues.apache.org/jira/browse/FLINK-34919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34919: -- Affects Version/s: 1.19.0 > WebMonitorEndpointTest.cleansUpExpiredExecutionGraphs fails starting REST > server > > > Key: FLINK-34919 > URL: https://issues.apache.org/jira/browse/FLINK-34919 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.19.0, 1.20.0 >Reporter: Ryan Skraba >Priority: Critical > Labels: test-stability > > [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58482=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef=8641] > {code:java} > Mar 22 04:12:50 04:12:50.260 [INFO] Running > org.apache.flink.runtime.webmonitor.WebMonitorEndpointTest > Mar 22 04:12:50 04:12:50.609 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 0.318 s <<< FAILURE! -- in > org.apache.flink.runtime.webmonitor.WebMonitorEndpointTest > Mar 22 04:12:50 04:12:50.609 [ERROR] > org.apache.flink.runtime.webmonitor.WebMonitorEndpointTest.cleansUpExpiredExecutionGraphs > -- Time elapsed: 0.303 s <<< ERROR! > Mar 22 04:12:50 java.net.BindException: Could not start rest endpoint on any > port in port range 8081 > Mar 22 04:12:50 at > org.apache.flink.runtime.rest.RestServerEndpoint.start(RestServerEndpoint.java:286) > Mar 22 04:12:50 at > org.apache.flink.runtime.webmonitor.WebMonitorEndpointTest.cleansUpExpiredExecutionGraphs(WebMonitorEndpointTest.java:69) > Mar 22 04:12:50 at java.lang.reflect.Method.invoke(Method.java:498) > Mar 22 04:12:50 at > java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) > Mar 22 04:12:50 at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > Mar 22 04:12:50 at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > Mar 22 04:12:50 at > java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) > Mar 22 04:12:50 at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) > Mar 22 04:12:50 {code} > This was noted as a symptom of FLINK-22980, but doesn't have the same failure. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34921) SystemProcessingTimeServiceTest fails due to missing output
Matthias Pohl created FLINK-34921: - Summary: SystemProcessingTimeServiceTest fails due to missing output Key: FLINK-34921 URL: https://issues.apache.org/jira/browse/FLINK-34921 Project: Flink Issue Type: Bug Components: API / DataStream Affects Versions: 1.20.0 Reporter: Matthias Pohl This PR CI build with {{AdaptiveScheduler}} enabled failed: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58476=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=11224 {code} "ForkJoinPool-61-worker-25" #863 daemon prio=5 os_prio=0 tid=0x7f8c19eba000 nid=0x60a5 waiting on condition [0x7f8bc2cf9000] Mar 21 17:19:42java.lang.Thread.State: WAITING (parking) Mar 21 17:19:42 at sun.misc.Unsafe.park(Native Method) Mar 21 17:19:42 - parking to wait for <0xd81959b8> (a java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) Mar 21 17:19:42 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) Mar 21 17:19:42 at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429) Mar 21 17:19:42 at java.util.concurrent.FutureTask.get(FutureTask.java:191) Mar 21 17:19:42 at org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeServiceTest$$Lambda$1443/1477662666.call(Unknown Source) Mar 21 17:19:42 at org.assertj.core.api.ThrowableAssert.catchThrowable(ThrowableAssert.java:63) Mar 21 17:19:42 at org.assertj.core.api.AssertionsForClassTypes.catchThrowable(AssertionsForClassTypes.java:892) Mar 21 17:19:42 at org.assertj.core.api.Assertions.catchThrowable(Assertions.java:1366) Mar 21 17:19:42 at org.assertj.core.api.Assertions.assertThatThrownBy(Assertions.java:1210) Mar 21 17:19:42 at org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeServiceTest.testQuiesceAndAwaitingCancelsScheduledAtFixRateFuture(SystemProcessingTimeServiceTest.java:92) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34911) ChangelogRecoveryRescaleITCase failed fatally with 127 exit code
[ https://issues.apache.org/jira/browse/FLINK-34911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34911: -- Component/s: Runtime / State Backends > ChangelogRecoveryRescaleITCase failed fatally with 127 exit code > > > Key: FLINK-34911 > URL: https://issues.apache.org/jira/browse/FLINK-34911 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.20.0 >Reporter: Ryan Skraba >Priority: Major > Labels: test-stability > > [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58455=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=9029] > > {code:java} > Mar 21 01:50:42 01:50:42.553 [ERROR] Command was /bin/sh -c cd > '/__w/1/s/flink-tests' && '/usr/lib/jvm/jdk-21.0.1+12/bin/java' > '-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' > '--add-opens=java.base/java.util=ALL-UNNAMED' > '--add-opens=java.base/java.io=ALL-UNNAMED' '-Xmx1536m' '-jar' > '/__w/1/s/flink-tests/target/surefire/surefirebooter-20240321010847189_810.jar' > '/__w/1/s/flink-tests/target/surefire' '2024-03-21T01-08-44_720-jvmRun3' > 'surefire-20240321010847189_808tmp' 'surefire_207-20240321010847189_809tmp' > Mar 21 01:50:42 01:50:42.553 [ERROR] Error occurred in starting fork, check > output in log > Mar 21 01:50:42 01:50:42.553 [ERROR] Process Exit Code: 127 > Mar 21 01:50:42 01:50:42.553 [ERROR] Crashed tests: > Mar 21 01:50:42 01:50:42.553 [ERROR] > org.apache.flink.test.checkpointing.ChangelogRecoveryRescaleITCase > Mar 21 01:50:42 01:50:42.553 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456) > Mar 21 01:50:42 01:50:42.553 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:418) > Mar 21 01:50:42 01:50:42.553 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:297) > Mar 21 01:50:42 01:50:42.553 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:250) > Mar 21 01:50:42 01:50:42.554 [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1240) > {code} > From the watchdog, only {{ChangelogRecoveryRescaleITCase}} didn't complete, > specifically parameterized with an {{EmbeddedRocksDBStateBackend}} with > incremental checkpointing enabled. > The base class ({{{}ChangelogRecoveryITCaseBase{}}}) starts a > {{MiniClusterWithClientResource}} > {code:java} > ~/Downloads/CI/logs-cron_jdk21-test_cron_jdk21_tests-1710982836$ cat > watchdog| grep "Tests run\|Running org.apache.flink" | grep -o > "org.apache.flink[^ ]*$" | sort | uniq -c | sort -n | head > 1 org.apache.flink.test.checkpointing.ChangelogRecoveryRescaleITCase > 2 org.apache.flink.api.connector.source.lib.NumberSequenceSourceITCase > {code} > > {color:#00} {color} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34911) ChangelogRecoveryRescaleITCase failed fatally with 127 exit code
[ https://issues.apache.org/jira/browse/FLINK-34911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-34911: -- Priority: Critical (was: Major) > ChangelogRecoveryRescaleITCase failed fatally with 127 exit code > > > Key: FLINK-34911 > URL: https://issues.apache.org/jira/browse/FLINK-34911 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.20.0 >Reporter: Ryan Skraba >Priority: Critical > Labels: test-stability > > [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58455=logs=a657ddbf-d986-5381-9649-342d9c92e7fb=dc085d4a-05c8-580e-06ab-21f5624dab16=9029] > > {code:java} > Mar 21 01:50:42 01:50:42.553 [ERROR] Command was /bin/sh -c cd > '/__w/1/s/flink-tests' && '/usr/lib/jvm/jdk-21.0.1+12/bin/java' > '-XX:+UseG1GC' '-Xms256m' '-XX:+IgnoreUnrecognizedVMOptions' > '--add-opens=java.base/java.util=ALL-UNNAMED' > '--add-opens=java.base/java.io=ALL-UNNAMED' '-Xmx1536m' '-jar' > '/__w/1/s/flink-tests/target/surefire/surefirebooter-20240321010847189_810.jar' > '/__w/1/s/flink-tests/target/surefire' '2024-03-21T01-08-44_720-jvmRun3' > 'surefire-20240321010847189_808tmp' 'surefire_207-20240321010847189_809tmp' > Mar 21 01:50:42 01:50:42.553 [ERROR] Error occurred in starting fork, check > output in log > Mar 21 01:50:42 01:50:42.553 [ERROR] Process Exit Code: 127 > Mar 21 01:50:42 01:50:42.553 [ERROR] Crashed tests: > Mar 21 01:50:42 01:50:42.553 [ERROR] > org.apache.flink.test.checkpointing.ChangelogRecoveryRescaleITCase > Mar 21 01:50:42 01:50:42.553 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:456) > Mar 21 01:50:42 01:50:42.553 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:418) > Mar 21 01:50:42 01:50:42.553 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:297) > Mar 21 01:50:42 01:50:42.553 [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:250) > Mar 21 01:50:42 01:50:42.554 [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1240) > {code} > From the watchdog, only {{ChangelogRecoveryRescaleITCase}} didn't complete, > specifically parameterized with an {{EmbeddedRocksDBStateBackend}} with > incremental checkpointing enabled. > The base class ({{{}ChangelogRecoveryITCaseBase{}}}) starts a > {{MiniClusterWithClientResource}} > {code:java} > ~/Downloads/CI/logs-cron_jdk21-test_cron_jdk21_tests-1710982836$ cat > watchdog| grep "Tests run\|Running org.apache.flink" | grep -o > "org.apache.flink[^ ]*$" | sort | uniq -c | sort -n | head > 1 org.apache.flink.test.checkpointing.ChangelogRecoveryRescaleITCase > 2 org.apache.flink.api.connector.source.lib.NumberSequenceSourceITCase > {code} > > {color:#00} {color} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-34643) JobIDLoggingITCase failed
[ https://issues.apache.org/jira/browse/FLINK-34643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829485#comment-17829485 ] Matthias Pohl edited comment on FLINK-34643 at 3/21/24 11:13 AM: - * https://github.com/apache/flink/actions/runs/8290287716/job/22688325865#step:10:9328 * https://github.com/apache/flink/actions/runs/8304571223/job/22730531076#step:10:9194 * https://github.com/apache/flink/actions/runs/8312246651/job/22747312383#step:10:8539 * https://github.com/apache/flink/actions/runs/8320242443/job/22764925776#step:10:8913 * https://github.com/apache/flink/actions/runs/8320242443/job/22764920830#step:10:8727 * https://github.com/apache/flink/actions/runs/8320242443/job/22764903331#step:10:9336 * https://github.com/apache/flink/actions/runs/8336454518/job/22813901357#step:10:8952 * https://github.com/apache/flink/actions/runs/8336454518/job/22813876201#step:10:9327 * https://github.com/apache/flink/actions/runs/8352823788/job/22863786799#step:10:8952 * https://github.com/apache/flink/actions/runs/8352823788/job/22863772571#step:10:9337 * https://github.com/apache/flink/actions/runs/8368626493/job/22913270846#step:10:8418 was (Author: mapohl): * https://github.com/apache/flink/actions/runs/8290287716/job/22688325865#step:10:9328 * https://github.com/apache/flink/actions/runs/8304571223/job/22730531076#step:10:9194 * https://github.com/apache/flink/actions/runs/8312246651/job/22747312383#step:10:8539 * https://github.com/apache/flink/actions/runs/8320242443/job/22764925776#step:10:8913 * https://github.com/apache/flink/actions/runs/8320242443/job/22764920830#step:10:8727 * https://github.com/apache/flink/actions/runs/8320242443/job/22764903331#step:10:9336 * https://github.com/apache/flink/actions/runs/8336454518/job/22813901357#step:10:8952 * https://github.com/apache/flink/actions/runs/8336454518/job/22813876201#step:10:9327 * https://github.com/apache/flink/actions/runs/8352823788/job/22863786799#step:10:8952 * https://github.com/apache/flink/actions/runs/8352823788/job/22863772571#step:10:9337 > JobIDLoggingITCase failed > - > > Key: FLINK-34643 > URL: https://issues.apache.org/jira/browse/FLINK-34643 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.20.0 >Reporter: Matthias Pohl >Assignee: Roman Khachatryan >Priority: Major > Labels: pull-request-available, test-stability > Fix For: 1.20.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=7897 > {code} > Mar 09 01:24:23 01:24:23.498 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 4.209 s <<< FAILURE! -- in > org.apache.flink.test.misc.JobIDLoggingITCase > Mar 09 01:24:23 01:24:23.498 [ERROR] > org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(ClusterClient) > -- Time elapsed: 1.459 s <<< ERROR! > Mar 09 01:24:23 java.lang.IllegalStateException: Too few log events recorded > for org.apache.flink.runtime.jobmaster.JobMaster (12) - this must be a bug in > the test code > Mar 09 01:24:23 at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:215) > Mar 09 01:24:23 at > org.apache.flink.test.misc.JobIDLoggingITCase.assertJobIDPresent(JobIDLoggingITCase.java:148) > Mar 09 01:24:23 at > org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(JobIDLoggingITCase.java:132) > Mar 09 01:24:23 at java.lang.reflect.Method.invoke(Method.java:498) > Mar 09 01:24:23 at > java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) > Mar 09 01:24:23 > {code} > The other test failures of this build were also caused by the same test: > * > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=2c3cbe13-dee0-5837-cf47-3053da9a8a78=b78d9d30-509a-5cea-1fef-db7abaa325ae=8349 > * > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=8209 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33186) CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished fails on AZP
[ https://issues.apache.org/jira/browse/FLINK-33186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829501#comment-17829501 ] Matthias Pohl commented on FLINK-33186: --- https://github.com/apache/flink/actions/runs/8369823390/job/22916375709#step:10:7894 > CheckpointAfterAllTasksFinishedITCase.testRestoreAfterSomeTasksFinished > fails on AZP > - > > Key: FLINK-33186 > URL: https://issues.apache.org/jira/browse/FLINK-33186 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.19.0, 1.18.1 >Reporter: Sergey Nuyanzin >Assignee: Jiang Xin >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=53509=logs=baf26b34-3c6a-54e8-f93f-cf269b32f802=8c9d126d-57d2-5a9e-a8c8-ff53f7b35cd9=8762 > fails as > {noformat} > Sep 28 01:23:43 Caused by: > org.apache.flink.runtime.checkpoint.CheckpointException: Task local > checkpoint failure. > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abort(PendingCheckpoint.java:550) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2248) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:2235) > Sep 28 01:23:43 at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:817) > Sep 28 01:23:43 at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > Sep 28 01:23:43 at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > Sep 28 01:23:43 at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Sep 28 01:23:43 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Sep 28 01:23:43 at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-28440) EventTimeWindowCheckpointingITCase failed with restore
[ https://issues.apache.org/jira/browse/FLINK-28440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829500#comment-17829500 ] Matthias Pohl commented on FLINK-28440: --- https://github.com/apache/flink/actions/runs/8360441603/job/22886656534#step:10:7536 > EventTimeWindowCheckpointingITCase failed with restore > -- > > Key: FLINK-28440 > URL: https://issues.apache.org/jira/browse/FLINK-28440 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / State Backends >Affects Versions: 1.16.0, 1.17.0, 1.18.0, 1.19.0 >Reporter: Huang Xingbo >Assignee: Yanfei Lei >Priority: Critical > Labels: auto-deprioritized-critical, pull-request-available, > stale-assigned, test-stability > Fix For: 1.20.0 > > Attachments: image-2023-02-01-00-51-54-506.png, > image-2023-02-01-01-10-01-521.png, image-2023-02-01-01-19-12-182.png, > image-2023-02-01-16-47-23-756.png, image-2023-02-01-16-57-43-889.png, > image-2023-02-02-10-52-56-599.png, image-2023-02-03-10-09-07-586.png, > image-2023-02-03-12-03-16-155.png, image-2023-02-03-12-03-56-614.png > > > {code:java} > Caused by: java.lang.Exception: Exception while creating > StreamOperatorStateContext. > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:256) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:268) > at > org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:722) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:698) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:665) > at > org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935) > at > org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:904) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.flink.util.FlinkException: Could not restore keyed > state backend for WindowOperator_0a448493b4782967b150582570326227_(2/4) from > any of the 1 provided restore options. > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:353) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:165) > ... 11 more > Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: > /tmp/junit1835099326935900400/junit1113650082510421526/52ee65b7-033f-4429-8ddd-adbe85e27ced > (No such file or directory) > at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:321) > at > org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.advance(StateChangelogHandleStreamHandleReader.java:87) > at > org.apache.flink.runtime.state.changelog.StateChangelogHandleStreamHandleReader$1.hasNext(StateChangelogHandleStreamHandleReader.java:69) > at > org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.readBackendHandle(ChangelogBackendRestoreOperation.java:96) > at > org.apache.flink.state.changelog.restore.ChangelogBackendRestoreOperation.restore(ChangelogBackendRestoreOperation.java:75) > at > org.apache.flink.state.changelog.ChangelogStateBackend.restore(ChangelogStateBackend.java:92) > at > org.apache.flink.state.changelog.AbstractChangelogStateBackend.createKeyedStateBackend(AbstractChangelogStateBackend.java:136) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:336) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) > ... 13 more > Caused by: java.io.FileNotFoundException: >
[jira] [Comment Edited] (FLINK-34227) Job doesn't disconnect from ResourceManager
[ https://issues.apache.org/jira/browse/FLINK-34227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829499#comment-17829499 ] Matthias Pohl edited comment on FLINK-34227 at 3/21/24 11:11 AM: - SetOperatorsITCase: https://github.com/apache/flink/actions/runs/8352823891/job/22863768994#step:10:12399 was (Author: mapohl): https://github.com/apache/flink/actions/runs/8352823891/job/22863768994#step:10:12399 > Job doesn't disconnect from ResourceManager > --- > > Key: FLINK-34227 > URL: https://issues.apache.org/jira/browse/FLINK-34227 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.19.0, 1.18.1 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Critical > Labels: github-actions, pull-request-available, test-stability > Attachments: FLINK-34227.7e7d69daebb438b8d03b7392c9c55115.log, > FLINK-34227.log > > > https://github.com/XComp/flink/actions/runs/7634987973/job/20800205972#step:10:14557 > {code} > [...] > "main" #1 prio=5 os_prio=0 tid=0x7f4b7000 nid=0x24ec0 waiting on > condition [0x7fccce1eb000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xbdd52618> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2131) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2099) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2077) > at > org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:876) > at > org.apache.flink.table.planner.runtime.stream.sql.WindowDistinctAggregateITCase.testHopWindow_Cube(WindowDistinctAggregateITCase.scala:550) > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34227) Job doesn't disconnect from ResourceManager
[ https://issues.apache.org/jira/browse/FLINK-34227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829499#comment-17829499 ] Matthias Pohl commented on FLINK-34227: --- https://github.com/apache/flink/actions/runs/8352823891/job/22863768994#step:10:12399 > Job doesn't disconnect from ResourceManager > --- > > Key: FLINK-34227 > URL: https://issues.apache.org/jira/browse/FLINK-34227 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.19.0, 1.18.1 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Critical > Labels: github-actions, pull-request-available, test-stability > Attachments: FLINK-34227.7e7d69daebb438b8d03b7392c9c55115.log, > FLINK-34227.log > > > https://github.com/XComp/flink/actions/runs/7634987973/job/20800205972#step:10:14557 > {code} > [...] > "main" #1 prio=5 os_prio=0 tid=0x7f4b7000 nid=0x24ec0 waiting on > condition [0x7fccce1eb000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xbdd52618> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2131) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2099) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:2077) > at > org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:876) > at > org.apache.flink.table.planner.runtime.stream.sql.WindowDistinctAggregateITCase.testHopWindow_Cube(WindowDistinctAggregateITCase.scala:550) > [...] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-34643) JobIDLoggingITCase failed
[ https://issues.apache.org/jira/browse/FLINK-34643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829485#comment-17829485 ] Matthias Pohl edited comment on FLINK-34643 at 3/21/24 11:08 AM: - * https://github.com/apache/flink/actions/runs/8290287716/job/22688325865#step:10:9328 * https://github.com/apache/flink/actions/runs/8304571223/job/22730531076#step:10:9194 * https://github.com/apache/flink/actions/runs/8312246651/job/22747312383#step:10:8539 * https://github.com/apache/flink/actions/runs/8320242443/job/22764925776#step:10:8913 * https://github.com/apache/flink/actions/runs/8320242443/job/22764920830#step:10:8727 * https://github.com/apache/flink/actions/runs/8320242443/job/22764903331#step:10:9336 * https://github.com/apache/flink/actions/runs/8336454518/job/22813901357#step:10:8952 * https://github.com/apache/flink/actions/runs/8336454518/job/22813876201#step:10:9327 * https://github.com/apache/flink/actions/runs/8352823788/job/22863786799#step:10:8952 * https://github.com/apache/flink/actions/runs/8352823788/job/22863772571#step:10:9337 was (Author: mapohl): * https://github.com/apache/flink/actions/runs/8290287716/job/22688325865#step:10:9328 * https://github.com/apache/flink/actions/runs/8304571223/job/22730531076#step:10:9194 * https://github.com/apache/flink/actions/runs/8312246651/job/22747312383#step:10:8539 * https://github.com/apache/flink/actions/runs/8320242443/job/22764925776#step:10:8913 * https://github.com/apache/flink/actions/runs/8320242443/job/22764920830#step:10:8727 * https://github.com/apache/flink/actions/runs/8320242443/job/22764903331#step:10:9336 * https://github.com/apache/flink/actions/runs/8336454518/job/22813901357#step:10:8952 * https://github.com/apache/flink/actions/runs/8336454518/job/22813876201#step:10:9327 * https://github.com/apache/flink/actions/runs/8352823788/job/22863786799#step:10:8952 > JobIDLoggingITCase failed > - > > Key: FLINK-34643 > URL: https://issues.apache.org/jira/browse/FLINK-34643 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.20.0 >Reporter: Matthias Pohl >Assignee: Roman Khachatryan >Priority: Major > Labels: pull-request-available, test-stability > Fix For: 1.20.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=7897 > {code} > Mar 09 01:24:23 01:24:23.498 [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 4.209 s <<< FAILURE! -- in > org.apache.flink.test.misc.JobIDLoggingITCase > Mar 09 01:24:23 01:24:23.498 [ERROR] > org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(ClusterClient) > -- Time elapsed: 1.459 s <<< ERROR! > Mar 09 01:24:23 java.lang.IllegalStateException: Too few log events recorded > for org.apache.flink.runtime.jobmaster.JobMaster (12) - this must be a bug in > the test code > Mar 09 01:24:23 at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:215) > Mar 09 01:24:23 at > org.apache.flink.test.misc.JobIDLoggingITCase.assertJobIDPresent(JobIDLoggingITCase.java:148) > Mar 09 01:24:23 at > org.apache.flink.test.misc.JobIDLoggingITCase.testJobIDLogging(JobIDLoggingITCase.java:132) > Mar 09 01:24:23 at java.lang.reflect.Method.invoke(Method.java:498) > Mar 09 01:24:23 at > java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) > Mar 09 01:24:23 at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) > Mar 09 01:24:23 > {code} > The other test failures of this build were also caused by the same test: > * > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=2c3cbe13-dee0-5837-cf47-3053da9a8a78=b78d9d30-509a-5cea-1fef-db7abaa325ae=8349 > * > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58187=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=8209 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-34718) KeyedPartitionWindowedStream and NonPartitionWindowedStream IllegalStateException in AZP
[ https://issues.apache.org/jira/browse/FLINK-34718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829484#comment-17829484 ] Matthias Pohl edited comment on FLINK-34718 at 3/21/24 11:08 AM: - before the fix was committed to master: * https://github.com/apache/flink/actions/runs/8290287716/job/22688325865#step:10:9329 * https://github.com/apache/flink/actions/runs/8304571223/job/22730531076#step:10:8057 * https://github.com/apache/flink/actions/runs/8312246651/job/22747312383#step:10:9345 * https://github.com/apache/flink/actions/runs/8336454518/job/22813876201#step:10:9330 * https://github.com/apache/flink/actions/runs/8352823788/job/22863772571#step:10:9347 was (Author: mapohl): before the fix was committed to master: * https://github.com/apache/flink/actions/runs/8290287716/job/22688325865#step:10:9329 * https://github.com/apache/flink/actions/runs/8304571223/job/22730531076#step:10:8057 * https://github.com/apache/flink/actions/runs/8312246651/job/22747312383#step:10:9345 * https://github.com/apache/flink/actions/runs/8336454518/job/22813876201#step:10:9330 > KeyedPartitionWindowedStream and NonPartitionWindowedStream > IllegalStateException in AZP > > > Key: FLINK-34718 > URL: https://issues.apache.org/jira/browse/FLINK-34718 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 1.20.0 >Reporter: Ryan Skraba >Assignee: Ryan Skraba >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.20.0 > > > [https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=58320=logs=8fd9202e-fd17-5b26-353c-ac1ff76c8f28=ea7cf968-e585-52cb-e0fc-f48de023a7ca=9646] > 18 of the KeyedPartitionWindowedStreamITCase and > NonKeyedPartitionWindowedStreamITCase unit tests introduced in FLINK-34543 > are failing in the adaptive scheduler profile, with errors similar to: > {code:java} > Mar 15 01:54:12 Caused by: java.lang.IllegalStateException: The adaptive > scheduler supports pipelined data exchanges (violated by MapPartition > (org.apache.flink.streaming.runtime.tasks.OneInputStreamTask) -> > ddb598ad156ed281023ba4eebbe487e3). > Mar 15 01:54:12 at > org.apache.flink.util.Preconditions.checkState(Preconditions.java:215) > Mar 15 01:54:12 at > org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.assertPreconditions(AdaptiveScheduler.java:438) > Mar 15 01:54:12 at > org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.(AdaptiveScheduler.java:356) > Mar 15 01:54:12 at > org.apache.flink.runtime.scheduler.adaptive.AdaptiveSchedulerFactory.createInstance(AdaptiveSchedulerFactory.java:124) > Mar 15 01:54:12 at > org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:121) > Mar 15 01:54:12 at > org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:384) > Mar 15 01:54:12 at > org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:361) > Mar 15 01:54:12 at > org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:128) > Mar 15 01:54:12 at > org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:100) > Mar 15 01:54:12 at > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112) > Mar 15 01:54:12 at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) > Mar 15 01:54:12 ... 4 more > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)