[jira] [Commented] (FLINK-32700) Support job drain for Savepoint upgrade mode jobs in Flink Operator
[ https://issues.apache.org/jira/browse/FLINK-32700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748784#comment-17748784 ] Gyula Fora commented on FLINK-32700: Here it is:) https://issues.apache.org/jira/browse/FLINK-30529 > Support job drain for Savepoint upgrade mode jobs in Flink Operator > --- > > Key: FLINK-32700 > URL: https://issues.apache.org/jira/browse/FLINK-32700 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.5.0 >Reporter: Manan Mangal >Assignee: Manan Mangal >Priority: Major > > During cancel job with savepoint upgrade mode, jobs can be allowed to drain > by advancing the watermark to the end, before they are stopped, so that the > in-flight data is not lost. > If the job fails to drain and hits timeout or any other error, it can be > cancelled without taking a savepoint. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32700) Support job drain for Savepoint upgrade mode jobs in Flink Operator
[ https://issues.apache.org/jira/browse/FLINK-32700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748783#comment-17748783 ] Gyula Fora commented on FLINK-32700: Don’t get me wrong I think adding support for draining makes perfect sense and we should start with that:) In a separate ticket we can discuss the savepoint timeout behavior . It sounds a little strange that you want to drain pipelines to avoid losing data but you would be ok with losing data on a savepoint timeout . I think if the job started to fail we should simply cancel/delete I agree there is a related ticket somewhere I will try to dig it out after I come back from vacation on Wednesday. > Support job drain for Savepoint upgrade mode jobs in Flink Operator > --- > > Key: FLINK-32700 > URL: https://issues.apache.org/jira/browse/FLINK-32700 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.5.0 >Reporter: Manan Mangal >Assignee: Manan Mangal >Priority: Major > > During cancel job with savepoint upgrade mode, jobs can be allowed to drain > by advancing the watermark to the end, before they are stopped, so that the > in-flight data is not lost. > If the job fails to drain and hits timeout or any other error, it can be > cancelled without taking a savepoint. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-connector-rabbitmq] RocMarshal commented on pull request #1: [FLINK-20628] RabbitMQ Connector using FLIP-27 Source API
RocMarshal commented on PR #1: URL: https://github.com/apache/flink-connector-rabbitmq/pull/1#issuecomment-1656547551 > > I am working for it now. > > Any update to report from your end on this @RocMarshal ? Hi, @MartijnVisser Still in doing. I'll update in the end of this week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] RocMarshal commented on pull request #20990: [FLINK-29050][JUnit5 Migration] Module: flink-hadoop-compatibility
RocMarshal commented on PR #20990: URL: https://github.com/apache/flink/pull/20990#issuecomment-1656535485 Hi, @1996fanrui Would you mind helping review this PR If you have free time ? Thanks you very much~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-32700) Support job drain for Savepoint upgrade mode jobs in Flink Operator
[ https://issues.apache.org/jira/browse/FLINK-32700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748759#comment-17748759 ] Talat Uyarer edited comment on FLINK-32700 at 7/29/23 1:36 AM: --- [~gyfora] Currently there is an issue on Operator's delete. We want to drain job because when we call {code:java} kubectl delete flinkdeployment{code} Operator delete immediately for stateless/stateful jobs. So we lose in flight data. I believe by default Operator should delete jobs by emitting max Watermark. However emitting max watermark also has an issue, If the sink is in stuck state we can not delete the job and we created deadlock situation. Does not matter Flinkdeployment upgrade mode we use but if we use last-state/savepoint state we should drain in flight data to prevent unnecessary data duplication. We are not silently cancel the jobs. Actually we wait until savepoint/checkpoint timeout to when user delete their flinkdeployment. Current situation even Operator does not wait for timeout, delete immediately. We would like to follow your suggestion. But please keep in your mind We have 20K+ stateful/stateless job those are triggered by programmatically. There is no way to change their deployment manually for us. cc [~mmangal] was (Author: talat): [~gyfora] Currently there is an issue on Operator's delete. We want to drain job because when we call {code:java} kubectl delete flinkdeployment{code} Operator delete immediately for stateless/stateful jobs. So we lose in flight data. I believe by default Operator should delete jobs by emitting max Watermark. How emitting mac watermark also has issue, If sink is in stuck state we can not delete the job and we created deadlock situation. Does not matter Flinkdeployment upgrade mode we use but if we use last-state/savepoint state we should drain in flight data to prevent unnecessary data duplication. We are not silently cancel the jobs. Actually we wait until savepoint/checkpoint timeout to when user delete their flinkdeployment. Current situation even Operator does not wait for timeout, delete immediately. We would like to follow your suggestion. But please keep in your mind We have 20K+ stateful/stateless job those are triggered by programmatically. There is no way to change their deployment manually for us. cc [~mmangal] > Support job drain for Savepoint upgrade mode jobs in Flink Operator > --- > > Key: FLINK-32700 > URL: https://issues.apache.org/jira/browse/FLINK-32700 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.5.0 >Reporter: Manan Mangal >Assignee: Manan Mangal >Priority: Major > > During cancel job with savepoint upgrade mode, jobs can be allowed to drain > by advancing the watermark to the end, before they are stopped, so that the > in-flight data is not lost. > If the job fails to drain and hits timeout or any other error, it can be > cancelled without taking a savepoint. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32700) Support job drain for Savepoint upgrade mode jobs in Flink Operator
[ https://issues.apache.org/jira/browse/FLINK-32700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748759#comment-17748759 ] Talat Uyarer commented on FLINK-32700: -- [~gyfora] Currently there is an issue on Operator's delete. We want to drain job because when we call {code:java} kubectl delete flinkdeployment{code} Operator delete immediately for stateless/stateful jobs. So we lose in flight data. I believe by default Operator should delete jobs by emitting max Watermark. How emitting mac watermark also has issue, If sink is in stuck state we can not delete the job and we created deadlock situation. Does not matter Flinkdeployment upgrade mode we use but if we use last-state/savepoint state we should drain in flight data to prevent unnecessary data duplication. We are not silently cancel the jobs. Actually we wait until savepoint/checkpoint timeout to when user delete their flinkdeployment. Current situation even Operator does not wait for timeout, delete immediately. We would like to follow your suggestion. But please keep in your mind We have 20K+ stateful/stateless job those are triggered by programmatically. There is no way to change their deployment manually for us. cc [~mmangal] > Support job drain for Savepoint upgrade mode jobs in Flink Operator > --- > > Key: FLINK-32700 > URL: https://issues.apache.org/jira/browse/FLINK-32700 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Affects Versions: kubernetes-operator-1.5.0 >Reporter: Manan Mangal >Assignee: Manan Mangal >Priority: Major > > During cancel job with savepoint upgrade mode, jobs can be allowed to drain > by advancing the watermark to the end, before they are stopped, so that the > in-flight data is not lost. > If the job fails to drain and hits timeout or any other error, it can be > cancelled without taking a savepoint. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] netvl commented on pull request #22854: [FLINK-32137] Added filtering of lambdas when building the flame graph
netvl commented on PR #22854: URL: https://github.com/apache/flink/pull/22854#issuecomment-1656406321 @1996fanrui sorry for the delay! I fixed the checkstyle issues (`mvn checkstyle:check` now passes locally), and I fixed the picture and the description of the PR. Please review, thanks!) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-20767) add nested field support for SupportsFilterPushDown
[ https://issues.apache.org/jira/browse/FLINK-20767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748687#comment-17748687 ] Venkata krishnan Sowrirajan commented on FLINK-20767: - I am currently looking into this issue and for our use case internally we have lots of nested fields. Based on the experiments done with Spark and Flink, without nested fields filter push down it will be very slow. > add nested field support for SupportsFilterPushDown > --- > > Key: FLINK-20767 > URL: https://issues.apache.org/jira/browse/FLINK-20767 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Planner >Affects Versions: 1.12.0 >Reporter: Jun Zhang >Priority: Major > Fix For: 1.18.0 > > > I think we should add the nested field support for SupportsFilterPushDown -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32710) The LeaderElection component IDs for running is only the JobID which might be confusing in the log output
[ https://issues.apache.org/jira/browse/FLINK-32710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748644#comment-17748644 ] Jiadong Lu commented on FLINK-32710: Hi, Matthias. If i understand correctly , this ticket is supposed to add an id prefix that could represent the component, like zk-\{JobId} ? if so , I would like to fix it. please assign it to me. > The LeaderElection component IDs for running is only the JobID which might be > confusing in the log output > - > > Key: FLINK-32710 > URL: https://issues.apache.org/jira/browse/FLINK-32710 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.18.0 >Reporter: Matthias Pohl >Priority: Minor > Labels: starter > > I noticed that the leader log messages for the jobs use the plain job ID as > the component ID. That might be confusing when reading the logs since it's a > UUID with no additional context. > We might want to add a prefix (e.g. {{job-}} to these component IDs.) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32711) Type mismatch when proctime function used as parameter
[ https://issues.apache.org/jira/browse/FLINK-32711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aitozi updated FLINK-32711: --- Issue Type: Bug (was: Improvement) > Type mismatch when proctime function used as parameter > -- > > Key: FLINK-32711 > URL: https://issues.apache.org/jira/browse/FLINK-32711 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.18.0 >Reporter: Aitozi >Priority: Major > > reproduce case: > {code:sql} > SELECT TYPEOF(PROCTIME()) > {code} > this query will fail with > org.apache.flink.table.planner.codegen.CodeGenException: Mismatch of > function's argument data type 'TIMESTAMP_LTZ(3) NOT NULL' and actual argument > type 'TIMESTAMP_LTZ(3)'. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32711) Type mismatch when proctime function used as parameter
[ https://issues.apache.org/jira/browse/FLINK-32711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748626#comment-17748626 ] Aitozi commented on FLINK-32711: It's caused by the hard coded nullable result type of PROCTIME_MATERIALIZE codegen. > Type mismatch when proctime function used as parameter > -- > > Key: FLINK-32711 > URL: https://issues.apache.org/jira/browse/FLINK-32711 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Planner >Affects Versions: 1.18.0 >Reporter: Aitozi >Priority: Major > > reproduce case: > {code:sql} > SELECT TYPEOF(PROCTIME()) > {code} > this query will fail with > org.apache.flink.table.planner.codegen.CodeGenException: Mismatch of > function's argument data type 'TIMESTAMP_LTZ(3) NOT NULL' and actual argument > type 'TIMESTAMP_LTZ(3)'. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32711) Type mismatch when proctime function used as parameter
Aitozi created FLINK-32711: -- Summary: Type mismatch when proctime function used as parameter Key: FLINK-32711 URL: https://issues.apache.org/jira/browse/FLINK-32711 Project: Flink Issue Type: Improvement Components: Table SQL / Planner Affects Versions: 1.18.0 Reporter: Aitozi reproduce case: {code:sql} SELECT TYPEOF(PROCTIME()) {code} this query will fail with org.apache.flink.table.planner.codegen.CodeGenException: Mismatch of function's argument data type 'TIMESTAMP_LTZ(3) NOT NULL' and actual argument type 'TIMESTAMP_LTZ(3)'. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32681) RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure unstablie
[ https://issues.apache.org/jira/browse/FLINK-32681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-32681: -- Issue Type: Bug (was: Technical Debt) > RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure unstablie > > > Key: FLINK-32681 > URL: https://issues.apache.org/jira/browse/FLINK-32681 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends, Tests >Affects Versions: 1.18.0 >Reporter: Chesnay Schepler >Priority: Critical > Labels: test-stability > Fix For: 1.18.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51712=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef > Failed 3 times in yesterdays nightly run. > {code} > Jul 26 01:12:46 01:12:46.889 [ERROR] > org.apache.flink.contrib.streaming.state.RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure > Time elapsed: 0.044 s <<< FAILURE! > Jul 26 01:12:46 java.lang.AssertionError > Jul 26 01:12:46 at org.junit.Assert.fail(Assert.java:87) > Jul 26 01:12:46 at org.junit.Assert.assertTrue(Assert.java:42) > Jul 26 01:12:46 at org.junit.Assert.assertFalse(Assert.java:65) > Jul 26 01:12:46 at org.junit.Assert.assertFalse(Assert.java:75) > Jul 26 01:12:46 at > org.apache.flink.contrib.streaming.state.RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure(RocksDBStateDownloaderTest.java:151) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32681) RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure unstablie
[ https://issues.apache.org/jira/browse/FLINK-32681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748609#comment-17748609 ] Matthias Pohl commented on FLINK-32681: --- I'm linking FLINK-32345 as a cause because that failing test was added in that issue. [~srichter] can you take this one? I raised the priority to Critical as well. > RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure unstablie > > > Key: FLINK-32681 > URL: https://issues.apache.org/jira/browse/FLINK-32681 > Project: Flink > Issue Type: Technical Debt > Components: Runtime / State Backends, Tests >Affects Versions: 1.18.0 >Reporter: Chesnay Schepler >Priority: Critical > Labels: test-stability > Fix For: 1.18.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51712=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef > Failed 3 times in yesterdays nightly run. > {code} > Jul 26 01:12:46 01:12:46.889 [ERROR] > org.apache.flink.contrib.streaming.state.RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure > Time elapsed: 0.044 s <<< FAILURE! > Jul 26 01:12:46 java.lang.AssertionError > Jul 26 01:12:46 at org.junit.Assert.fail(Assert.java:87) > Jul 26 01:12:46 at org.junit.Assert.assertTrue(Assert.java:42) > Jul 26 01:12:46 at org.junit.Assert.assertFalse(Assert.java:65) > Jul 26 01:12:46 at org.junit.Assert.assertFalse(Assert.java:75) > Jul 26 01:12:46 at > org.apache.flink.contrib.streaming.state.RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure(RocksDBStateDownloaderTest.java:151) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32681) RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure unstablie
[ https://issues.apache.org/jira/browse/FLINK-32681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-32681: -- Priority: Critical (was: Major) > RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure unstablie > > > Key: FLINK-32681 > URL: https://issues.apache.org/jira/browse/FLINK-32681 > Project: Flink > Issue Type: Technical Debt > Components: Runtime / State Backends, Tests >Affects Versions: 1.18.0 >Reporter: Chesnay Schepler >Priority: Critical > Labels: test-stability > Fix For: 1.18.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51712=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef > Failed 3 times in yesterdays nightly run. > {code} > Jul 26 01:12:46 01:12:46.889 [ERROR] > org.apache.flink.contrib.streaming.state.RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure > Time elapsed: 0.044 s <<< FAILURE! > Jul 26 01:12:46 java.lang.AssertionError > Jul 26 01:12:46 at org.junit.Assert.fail(Assert.java:87) > Jul 26 01:12:46 at org.junit.Assert.assertTrue(Assert.java:42) > Jul 26 01:12:46 at org.junit.Assert.assertFalse(Assert.java:65) > Jul 26 01:12:46 at org.junit.Assert.assertFalse(Assert.java:75) > Jul 26 01:12:46 at > org.apache.flink.contrib.streaming.state.RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure(RocksDBStateDownloaderTest.java:151) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32662) JobMasterTest.testRetrievingCheckpointStats fails with NPE on AZP
[ https://issues.apache.org/jira/browse/FLINK-32662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748607#comment-17748607 ] Matthias Pohl commented on FLINK-32662: --- https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51777=logs=0e7be18f-84f2-53f0-a32d-4a5e4a174679=7c1d86e3-35bd-5fd5-3b7c-30c126a78702=7282 > JobMasterTest.testRetrievingCheckpointStats fails with NPE on AZP > - > > Key: FLINK-32662 > URL: https://issues.apache.org/jira/browse/FLINK-32662 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.18.0 >Reporter: Sergey Nuyanzin >Assignee: Hong Liang Teoh >Priority: Critical > Labels: test-stability > > This build > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51452=logs=0e7be18f-84f2-53f0-a32d-4a5e4a174679=7c1d86e3-35bd-5fd5-3b7c-30c126a78702=8654 > fails with NPE as > {noformat} > Jul 20 01:01:33 01:01:33.491 [ERROR] > org.apache.flink.runtime.jobmaster.JobMasterTest.testRetrievingCheckpointStats > Time elapsed: 0.036 s <<< ERROR! > Jul 20 01:01:33 java.lang.NullPointerException > Jul 20 01:01:33 at > org.apache.flink.runtime.jobmaster.JobMasterTest.testRetrievingCheckpointStats(JobMasterTest.java:2132) > Jul 20 01:01:33 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > Jul 20 01:01:33 at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > Jul 20 01:01:33 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Jul 20 01:01:33 at java.lang.reflect.Method.invoke(Method.java:498) > Jul 20 01:01:33 at > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727) > Jul 20 01:01:33 at > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > Jul 20 01:01:33 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > Jul 20 01:01:33 at > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156) > Jul 20 01:01:33 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147) > Jul 20 01:01:33 at > org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86) > Jul 20 01:01:33 at > org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) > Jul 20 01:01:33 at > org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) > Jul 20 01:01:33 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > Jul 20 01:01:33 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > Jul 20 01:01:33 at > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > ... > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32681) RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure unstablie
[ https://issues.apache.org/jira/browse/FLINK-32681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748606#comment-17748606 ] Matthias Pohl commented on FLINK-32681: --- twice in the same build: * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51777=logs=0da23115-68bb-5dcd-192c-bd4c8adebde1=24c3384f-1bcb-57b3-224f-51bf973bbee8=10286 * https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51777=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef=10241 > RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure unstablie > > > Key: FLINK-32681 > URL: https://issues.apache.org/jira/browse/FLINK-32681 > Project: Flink > Issue Type: Technical Debt > Components: Runtime / State Backends, Tests >Affects Versions: 1.18.0 >Reporter: Chesnay Schepler >Priority: Major > Labels: test-stability > Fix For: 1.18.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51712=logs=77a9d8e1-d610-59b3-fc2a-4766541e0e33=125e07e7-8de0-5c6c-a541-a567415af3ef > Failed 3 times in yesterdays nightly run. > {code} > Jul 26 01:12:46 01:12:46.889 [ERROR] > org.apache.flink.contrib.streaming.state.RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure > Time elapsed: 0.044 s <<< FAILURE! > Jul 26 01:12:46 java.lang.AssertionError > Jul 26 01:12:46 at org.junit.Assert.fail(Assert.java:87) > Jul 26 01:12:46 at org.junit.Assert.assertTrue(Assert.java:42) > Jul 26 01:12:46 at org.junit.Assert.assertFalse(Assert.java:65) > Jul 26 01:12:46 at org.junit.Assert.assertFalse(Assert.java:75) > Jul 26 01:12:46 at > org.apache.flink.contrib.streaming.state.RocksDBStateDownloaderTest.testMultiThreadCleanupOnFailure(RocksDBStateDownloaderTest.java:151) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] luoyuxia commented on a diff in pull request #23013: [FLINK-32620][sink] Migrate DiscardingSink to sinkv2.
luoyuxia commented on code in PR #23013: URL: https://github.com/apache/flink/pull/23013#discussion_r1277459386 ## flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/sink/v2/DiscardingSink.java: ## @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.api.functions.sink.v2; + +import org.apache.flink.annotation.PublicEvolving; +import org.apache.flink.api.common.SupportsConcurrentExecutionAttempts; +import org.apache.flink.api.connector.sink2.Sink; +import org.apache.flink.api.connector.sink2.SinkWriter; + +import java.io.IOException; + +/** + * A special sink that ignores all elements. + * + * @param The type of elements received by the sink. + */ +@PublicEvolving Review Comment: Just wondering is there any dicuss or vote for making it to `PublicEvolving`. ## flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/table/stream/StreamingSink.java: ## @@ -173,7 +173,7 @@ public static DataStreamSink sink( } DataStreamSink discardingSink = -stream.addSink(new DiscardingSink<>()).name("end").setParallelism(1); +stream.sinkTo(new DiscardingSink<>()).name("end").setParallelism(1); Review Comment: Do you mean `json_execution_plan` or `execNode plan` described in[ FLIP-190 ](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489#FLIP190:SupportVersionUpgradesforTableAPI ) As long as we can keep `execNode plan` same, there won't break `compatibility`, And as i can see, the `execNode plan` won't change by this pr, so I don't think it'll break `compatibility` ## flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/translators/SinkTransformationTranslator.java: ## @@ -315,6 +315,13 @@ private R adjustTransformations( Transformation::getDescription, Transformation::setDescription); +// handle coLocationGroupKey. Review Comment: Why change it in this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] Tartarus0zm commented on pull request #23060: [FLINK-32519][docs] Add doc for [CREATE OR] REPLACE TABLE AS statement
Tartarus0zm commented on PR #23060: URL: https://github.com/apache/flink/pull/23060#issuecomment-1655600364 @luoyuxia thanks for your review! PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-31168) JobManagerHAProcessFailureRecoveryITCase failed due to job not being found
[ https://issues.apache.org/jira/browse/FLINK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748584#comment-17748584 ] Matthias Pohl commented on FLINK-31168: --- (y) I was able to reproduce this test failure using JDK17 (after 3 runs and after 74 runs). I'm gonna go ahead and investigate it further. > JobManagerHAProcessFailureRecoveryITCase failed due to job not being found > -- > > Key: FLINK-31168 > URL: https://issues.apache.org/jira/browse/FLINK-31168 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.3, 1.16.1, 1.18.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Blocker > Labels: test-stability > Fix For: 1.18.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706 > We see this build failure because a job couldn't be found: > {code} > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:319) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:958) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase.testJobManagerFailure(JobManagerHAProcessFailureRecoveryITCase.java:235) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase$4.run(JobManagerHAProcessFailureRecoveryITCase.java:336) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056) > ... 4 more > Caused by: java.lang.RuntimeException: Error while waiting for job to be > initialized > at > org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished(ClientUtils.java:160) > at > org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.lambda$execute$2(AbstractSessionClusterExecutor.java:82) > at > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73) > at > java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642) > at > java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479) > at > java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) > at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) > at > java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) > at > java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) > at > java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) > Caused by: java.util.concurrent.ExecutionException: > org.apache.flink.runtime.rest.util.RestClientException: > [org.apache.flink.runtime.rest.NotFoundException: Job > 865dcd87f4828dbeb3d93eb52e2636b1 not found > at > org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$1(AbstractExecutionGraphHandler.java:99) > at > java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) > at > java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at > java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > at > org.apache.flink.runtime.rest.handler.legacy.DefaultExecutionGraphCache.lambda$getExecutionGraphInternal$0(DefaultExecutionGraphCache.java:109) > at > java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) > at > java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at >
[GitHub] [flink] luoyuxia commented on a diff in pull request #23060: [FLINK-32519][docs] Add doc for [CREATE OR] REPLACE TABLE AS statement
luoyuxia commented on code in PR #23060: URL: https://github.com/apache/flink/pull/23060#discussion_r1277438779 ## docs/content/docs/dev/table/sql/create.md: ## @@ -557,6 +558,59 @@ INSERT INTO my_ctas_table SELECT id, name, age FROM source_table WHERE mod(id, 1 {{< top >}} +## [CREATE OR] REPLACE TABLE +```sql +[CREATE OR] REPLACE TABLE [catalog_name.][db_name.]table_name +[COMMENT table_comment] +WITH (key1=val1, key2=val2, ...) +AS select_query +``` + +**Note** RTAS has the following semantic: +* REPLACE TABLE AS SELECT statement: the target table to be replaced must exist, otherwise, an exception will be thrown. +* CREATE OR REPLACE TABLE AS SELECT statement: the target table to be replaced will be created if it does not exist; if it does exist, it'll be replaced. + +Tables can also be replaced(or created) and populated by the results of a query in one [CREATE OR] REPLACE TABLE(RTAS) statement. RTAS is the simplest and fastest way to replace and insert data into a table with a single command. Review Comment: ```suggestion Tables can also be replaced(or created) and populated by the results of a query in one [CREATE OR] REPLACE TABLE AS SELECT(RTAS) statement. RTAS is the simplest and fastest way to replace(or create) and insert data into a table with a single command. ``` ## docs/content/docs/dev/table/sql/create.md: ## @@ -557,6 +558,59 @@ INSERT INTO my_ctas_table SELECT id, name, age FROM source_table WHERE mod(id, 1 {{< top >}} +## [CREATE OR] REPLACE TABLE +```sql +[CREATE OR] REPLACE TABLE [catalog_name.][db_name.]table_name +[COMMENT table_comment] +WITH (key1=val1, key2=val2, ...) +AS select_query +``` + +**Note** RTAS has the following semantic: +* REPLACE TABLE AS SELECT statement: the target table to be replaced must exist, otherwise, an exception will be thrown. +* CREATE OR REPLACE TABLE AS SELECT statement: the target table to be replaced will be created if it does not exist; if it does exist, it'll be replaced. + +Tables can also be replaced(or created) and populated by the results of a query in one [CREATE OR] REPLACE TABLE(RTAS) statement. RTAS is the simplest and fastest way to replace and insert data into a table with a single command. Review Comment: nit: I'd like to remove `also`, for there's no context in here. It's not like ctas part.' ```suggestion Tables can be replaced(or created) and populated by the results of a query in one [CREATE OR] REPLACE TABLE(RTAS) statement. RTAS is the simplest and fastest way to replace and insert data into a table with a single command. ``` ## docs/content.zh/docs/dev/table/sql/create.md: ## @@ -557,6 +558,58 @@ INSERT INTO my_ctas_table SELECT id, name, age FROM source_table WHERE mod(id, 1 {{< top >}} +## [CREATE OR] REPLACE TABLE +```sql +[CREATE OR] REPLACE TABLE [catalog_name.][db_name.]table_name +[COMMENT table_comment] +WITH (key1=val1, key2=val2, ...) +AS select_query +``` + +**注意** RTAS 有如下语义: +* REPLACE TABLE AS SELECT 语句:要被替换的目标表必须存在,否则会报错。 +* CREATE OR REPLACE TABLE AS SELECT 语句:要被替换的目标表如果不存在,引擎会自动创建;如果存在的话,引擎就直接替换它。 + +表也可以通过一个 [CREATE OR] REPLACE TABLE(RTAS) 语句中的查询结果来替换(或创建)并填充数据,RTAS 是一种简单快捷的替换表并插入数据的方法。 Review Comment: ```suggestion 表可以通过一个 [CREATE OR] REPLACE TABLE(RTAS)语句中的查询结果来替换(或创建)并填充数据,RTAS 是一种简单快捷的替换表并插入数据的方法。 ``` ## docs/content.zh/docs/dev/table/sql/create.md: ## @@ -557,6 +558,58 @@ INSERT INTO my_ctas_table SELECT id, name, age FROM source_table WHERE mod(id, 1 {{< top >}} +## [CREATE OR] REPLACE TABLE +```sql +[CREATE OR] REPLACE TABLE [catalog_name.][db_name.]table_name +[COMMENT table_comment] +WITH (key1=val1, key2=val2, ...) +AS select_query +``` + +**注意** RTAS 有如下语义: +* REPLACE TABLE AS SELECT 语句:要被替换的目标表必须存在,否则会报错。 +* CREATE OR REPLACE TABLE AS SELECT 语句:要被替换的目标表如果不存在,引擎会自动创建;如果存在的话,引擎就直接替换它。 + +表也可以通过一个 [CREATE OR] REPLACE TABLE(RTAS) 语句中的查询结果来替换(或创建)并填充数据,RTAS 是一种简单快捷的替换表并插入数据的方法。 Review Comment: ```suggestion 表可以通过一个 [CREATE OR] REPLACE TABLE(RTAS)语句中的查询结果来替换(或创建)并填充数据,RTAS 是一种简单快捷的替换表并插入数据的方法。 ``` ## docs/content.zh/docs/dev/table/sql/create.md: ## @@ -557,6 +558,58 @@ INSERT INTO my_ctas_table SELECT id, name, age FROM source_table WHERE mod(id, 1 {{< top >}} +## [CREATE OR] REPLACE TABLE +```sql +[CREATE OR] REPLACE TABLE [catalog_name.][db_name.]table_name +[COMMENT table_comment] +WITH (key1=val1, key2=val2, ...) +AS select_query +``` + +**注意** RTAS 有如下语义: +* REPLACE TABLE AS SELECT 语句:要被替换的目标表必须存在,否则会报错。 +* CREATE OR REPLACE TABLE AS SELECT 语句:要被替换的目标表如果不存在,引擎会自动创建;如果存在的话,引擎就直接替换它。 + +表也可以通过一个 [CREATE OR] REPLACE TABLE(RTAS) 语句中的查询结果来替换(或创建)并填充数据,RTAS 是一种简单快捷的替换表并插入数据的方法。 Review Comment: ```suggestion 表也可以通过一个 [CREATE OR] REPLACE TABLE(RTAS)语句中的查询结果来替换(或创建)并填充数据,RTAS 是一种简单快捷的替换(或创建)表并插入数据的方法。 ``` ## docs/content.zh/docs/dev/table/sql/create.md: ## @@ -557,6 +558,58 @@ INSERT INTO
[jira] [Closed] (FLINK-32691) Make it possible to use builtin functions without catalog/db set
[ https://issues.apache.org/jira/browse/FLINK-32691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Wysakowicz closed FLINK-32691. Resolution: Fixed Fixed in 3f63e03e83144e9857834f8db1895637d2aa218a > Make it possible to use builtin functions without catalog/db set > > > Key: FLINK-32691 > URL: https://issues.apache.org/jira/browse/FLINK-32691 > Project: Flink > Issue Type: Bug >Reporter: Jim Hughes >Assignee: Jim Hughes >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0 > > > Relative to https://issues.apache.org/jira/browse/FLINK-32584, function > lookup fails without the catalog and database set. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32650) Added the ability to split flink-protobuf codegen code
[ https://issues.apache.org/jira/browse/FLINK-32650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn Visser updated FLINK-32650: --- Fix Version/s: (was: 1.18.0) > Added the ability to split flink-protobuf codegen code > -- > > Key: FLINK-32650 > URL: https://issues.apache.org/jira/browse/FLINK-32650 > Project: Flink > Issue Type: Improvement > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) >Affects Versions: 1.18.0 >Reporter: 李精卫 >Priority: Major > > h3. backgroud > Flink serializes and deserializes protobuf format data by calling the decode > or encode method in GeneratedProtoToRow_XXX.java generated by codegen to > parse byte[] data into protobuf java objects. The size of the decode/encode > codegen method body is strongly related to the number of defined fields in > protobuf. When the number of fields exceeds a certain threshold and the > compiled method body exceeds 8k, the decode/encode method will not be > optimized by JIT, seriously affecting serialization or deserialization > performance. Even if the compiled method body exceeds 64k, it will directly > cause the task to fail to start. > h3. solution > So I proposed Codegen Splitter for protobuf parsing to split the > encode/decode method to solve this problem. > The specific idea is as follows. In the current decode/encode method, each > field defined for the protobuf message is placed in the method body. In fact, > there are no shared parameters between the fields, so multiple fields can be > merged and parsed and written into the split method body. If the number of > strings in the current method body exceeds the threshold, a split method will > be generated, these fields will be parsed in the split method, and the split > method will be called in the decode/encode method. By analogy, the > decode/encode method including the split method is finally generated. > after spilt code example > > {code:java} > //代码占位符 > public static RowData > decode(org.apache.flink.formats.protobuf.testproto.AdProfile.AdProfilePb > message){ > RowData rowData=null; > org.apache.flink.formats.protobuf.testproto.AdProfile.AdProfilePb message1242 > = message; > GenericRowData rowData1242 = new GenericRowData(5); > split2585(rowData1242, message1242); > rowData = rowData1242;return rowData; > } > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-connector-rabbitmq] MartijnVisser commented on pull request #1: [FLINK-20628] RabbitMQ Connector using FLIP-27 Source API
MartijnVisser commented on PR #1: URL: https://github.com/apache/flink-connector-rabbitmq/pull/1#issuecomment-1655509560 > I am working for it now. Any update to report from your end on this @RocMarshal ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-32672) Migrate RabbitMQ connector to Source V2 API
[ https://issues.apache.org/jira/browse/FLINK-32672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748561#comment-17748561 ] Martijn Visser commented on FLINK-32672: There is already FLINK-20628 and a PR https://github.com/apache/flink-connector-rabbitmq/pull/1 - it just appears that the committer isn't responding to the review comments left by [~chesnay] It's kind of a similar problem to the Google Pubsub connector: we need maintainers, else I don't see how we can keep this connector alive. > Migrate RabbitMQ connector to Source V2 API > --- > > Key: FLINK-32672 > URL: https://issues.apache.org/jira/browse/FLINK-32672 > Project: Flink > Issue Type: Sub-task > Components: Connectors/ RabbitMQ >Reporter: Alexander Fedulov >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32673) Migrage Google PubSub connector to V2
[ https://issues.apache.org/jira/browse/FLINK-32673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748559#comment-17748559 ] Martijn Visser commented on FLINK-32673: There's two things imho: 1. I can restart the discussion I originally opened on this at https://lists.apache.org/thread/c1stxqmwdzxmpjp6pj2pxsvmpbtwhqnf to see if someone comes forward to help out 2. If no-one comes forward, then I would be inclined to open a vote to remove the connector from Flink given a lack of maintainers > Migrage Google PubSub connector to V2 > - > > Key: FLINK-32673 > URL: https://issues.apache.org/jira/browse/FLINK-32673 > Project: Flink > Issue Type: Sub-task >Reporter: Alexander Fedulov >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32670) Cascade deprecation to classes that implement SourceFunction
[ https://issues.apache.org/jira/browse/FLINK-32670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Fedulov updated FLINK-32670: -- Description: (was: ParallelSourceFunction, RichParallelSourceFunction, ExternallyInducedSource) > Cascade deprecation to classes that implement SourceFunction > > > Key: FLINK-32670 > URL: https://issues.apache.org/jira/browse/FLINK-32670 > Project: Flink > Issue Type: Sub-task >Reporter: Alexander Fedulov >Assignee: Alexander Fedulov >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32697) When using JDBC to insert data into the oracle database, an error will be reported if the target field is lowercase,
[ https://issues.apache.org/jira/browse/FLINK-32697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748557#comment-17748557 ] Martijn Visser commented on FLINK-32697: [~sunyanyong] Do you want to fix it? > When using JDBC to insert data into the oracle database, an error will be > reported if the target field is lowercase, > > > Key: FLINK-32697 > URL: https://issues.apache.org/jira/browse/FLINK-32697 > Project: Flink > Issue Type: Bug > Components: Connectors / JDBC >Affects Versions: jdbc-3.1.1 >Reporter: sunyanyong >Priority: Major > Attachments: image-2023-07-27-10-39-31-884.png > > > When using JDBC to insert data into the oracle database, if the target field > is lowercase, an error will be reported. > An example input format is "insert into tableName(ID, `"name"`) values(1, > 'test')". > The reason for the error is that there is a problem with the judgment logic > in the > FieldNamedPreparedStatementImpl.java. The judgment logic will treat double > quotes as delimiters. > !image-2023-07-27-10-39-31-884.png|width=876,height=540! > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32697) When using JDBC to insert data into the oracle database, an error will be reported if the target field is lowercase,
[ https://issues.apache.org/jira/browse/FLINK-32697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn Visser updated FLINK-32697: --- Priority: Major (was: Blocker) > When using JDBC to insert data into the oracle database, an error will be > reported if the target field is lowercase, > > > Key: FLINK-32697 > URL: https://issues.apache.org/jira/browse/FLINK-32697 > Project: Flink > Issue Type: Bug > Components: Connectors / JDBC >Affects Versions: jdbc-3.1.1 >Reporter: sunyanyong >Priority: Major > Attachments: image-2023-07-27-10-39-31-884.png > > > When using JDBC to insert data into the oracle database, if the target field > is lowercase, an error will be reported. > An example input format is "insert into tableName(ID, `"name"`) values(1, > 'test')". > The reason for the error is that there is a problem with the judgment logic > in the > FieldNamedPreparedStatementImpl.java. The judgment logic will treat double > quotes as delimiters. > !image-2023-07-27-10-39-31-884.png|width=876,height=540! > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32670) Cascade deprecation to classes that implement SourceFunction
[ https://issues.apache.org/jira/browse/FLINK-32670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Fedulov updated FLINK-32670: -- Summary: Cascade deprecation to classes that implement SourceFunction (was: Annotate interfaces that inherit from SourceFunction as deprecated ) > Cascade deprecation to classes that implement SourceFunction > > > Key: FLINK-32670 > URL: https://issues.apache.org/jira/browse/FLINK-32670 > Project: Flink > Issue Type: Sub-task >Reporter: Alexander Fedulov >Assignee: Alexander Fedulov >Priority: Major > Labels: pull-request-available > > ParallelSourceFunction, RichParallelSourceFunction, ExternallyInducedSource -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32694) Migrate classes that implement ParallelSourceFunction to Source V2 API
[ https://issues.apache.org/jira/browse/FLINK-32694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Fedulov updated FLINK-32694: -- Summary: Migrate classes that implement ParallelSourceFunction to Source V2 API (was: Cascade deprecation to classes that implement ParallelSourceFunction) > Migrate classes that implement ParallelSourceFunction to Source V2 API > -- > > Key: FLINK-32694 > URL: https://issues.apache.org/jira/browse/FLINK-32694 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Common >Reporter: Alexander Fedulov >Assignee: Alexander Fedulov >Priority: Major > > > * SimpleEndlessSourceWithBloatedState in HeavyDeploymentStressTestProgram > (org.apache.flink.deployment) > * StatefulSequenceSource (org.apache.flink.streaming.api.functions.source) > * ArrowSourceFunction (org.apache.flink.table.runtime.arrow.sources) > * InputFormatSourceFunction (org.apache.flink.streaming.api.functions.source) > * EventsGeneratorSource > (org.apache.flink.streaming.examples.statemachine.generator) > * FromSplittableIteratorFunction > (org.apache.flink.streaming.api.functions.source) > * DataGeneratorSource > (org.apache.flink.streaming.api.functions.source.datagen) > * -- Tests: > * MySource in StatefulStreamingJob (org.apache.flink.test) > * RandomLongSource in StickyAllocationAndLocalRecoveryTestJob > (org.apache.flink.streaming.tests) > * SequenceGeneratorSource (org.apache.flink.streaming.tests) > * TtlStateUpdateSource (org.apache.flink.streaming.tests) > * StringSourceFunction in NettyShuffleMemoryControlTestProgram > (org.apache.flink.streaming.tests) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (FLINK-32694) Cascade deprecation to classes that implement ParallelSourceFunction
[ https://issues.apache.org/jira/browse/FLINK-32694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Fedulov reassigned FLINK-32694: - Assignee: Alexander Fedulov > Cascade deprecation to classes that implement ParallelSourceFunction > > > Key: FLINK-32694 > URL: https://issues.apache.org/jira/browse/FLINK-32694 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Common >Reporter: Alexander Fedulov >Assignee: Alexander Fedulov >Priority: Major > > > * SimpleEndlessSourceWithBloatedState in HeavyDeploymentStressTestProgram > (org.apache.flink.deployment) > * StatefulSequenceSource (org.apache.flink.streaming.api.functions.source) > * ArrowSourceFunction (org.apache.flink.table.runtime.arrow.sources) > * InputFormatSourceFunction (org.apache.flink.streaming.api.functions.source) > * EventsGeneratorSource > (org.apache.flink.streaming.examples.statemachine.generator) > * FromSplittableIteratorFunction > (org.apache.flink.streaming.api.functions.source) > * DataGeneratorSource > (org.apache.flink.streaming.api.functions.source.datagen) > * -- Tests: > * MySource in StatefulStreamingJob (org.apache.flink.test) > * RandomLongSource in StickyAllocationAndLocalRecoveryTestJob > (org.apache.flink.streaming.tests) > * SequenceGeneratorSource (org.apache.flink.streaming.tests) > * TtlStateUpdateSource (org.apache.flink.streaming.tests) > * StringSourceFunction in NettyShuffleMemoryControlTestProgram > (org.apache.flink.streaming.tests) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] yunfengzhou-hub commented on a diff in pull request #22931: [FLINK-32514] Support configuring checkpointing interval during process backlog
yunfengzhou-hub commented on code in PR #22931: URL: https://github.com/apache/flink/pull/22931#discussion_r1277383109 ## flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java: ## @@ -183,6 +189,15 @@ public class CheckpointCoordinator { /** A handle to the current periodic trigger, to cancel it when necessary. */ private ScheduledFuture currentPeriodicTrigger; +/** + * The timestamp (via {@link Clock#relativeTimeMillis()}) when the next checkpoint will be + * triggered. + * + * If it's value is {@link Long#MAX_VALUE}, it means there is not a next checkpoint + * scheduled. + */ +private volatile long nextCheckpointTriggeringRelativeTime; Review Comment: `nextCheckpointTriggeringRelativeTime` and `currentPeriodicTrigger` should be updated together, so I added `synchronized(lock){}` around code blocks related to these variables instead of adding `@GuardedBy("lock")` to each of them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-31168) JobManagerHAProcessFailureRecoveryITCase failed due to job not being found
[ https://issues.apache.org/jira/browse/FLINK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748537#comment-17748537 ] Matthias Pohl edited comment on FLINK-31168 at 7/28/23 10:18 AM: - So far, I wasn't able to reproduce it locally with up to now 300 test runs. Update: The test runs happened with JDK11. I noticed that the two test failures in question happened with JDK17. I'm gonna continue looking into the issue in that direction. was (Author: mapohl): So far, I wasn't able to reproduce it locally with up to now 230 test runs. > JobManagerHAProcessFailureRecoveryITCase failed due to job not being found > -- > > Key: FLINK-31168 > URL: https://issues.apache.org/jira/browse/FLINK-31168 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.3, 1.16.1, 1.18.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Blocker > Labels: test-stability > Fix For: 1.18.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706 > We see this build failure because a job couldn't be found: > {code} > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:319) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:958) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase.testJobManagerFailure(JobManagerHAProcessFailureRecoveryITCase.java:235) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase$4.run(JobManagerHAProcessFailureRecoveryITCase.java:336) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056) > ... 4 more > Caused by: java.lang.RuntimeException: Error while waiting for job to be > initialized > at > org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished(ClientUtils.java:160) > at > org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.lambda$execute$2(AbstractSessionClusterExecutor.java:82) > at > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73) > at > java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642) > at > java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479) > at > java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) > at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) > at > java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) > at > java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) > at > java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) > Caused by: java.util.concurrent.ExecutionException: > org.apache.flink.runtime.rest.util.RestClientException: > [org.apache.flink.runtime.rest.NotFoundException: Job > 865dcd87f4828dbeb3d93eb52e2636b1 not found > at > org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$1(AbstractExecutionGraphHandler.java:99) > at > java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) > at > java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at > java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > at > org.apache.flink.runtime.rest.handler.legacy.DefaultExecutionGraphCache.lambda$getExecutionGraphInternal$0(DefaultExecutionGraphCache.java:109) > at > java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) > at >
[jira] [Commented] (FLINK-31168) JobManagerHAProcessFailureRecoveryITCase failed due to job not being found
[ https://issues.apache.org/jira/browse/FLINK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748537#comment-17748537 ] Matthias Pohl commented on FLINK-31168: --- So far, I wasn't able to reproduce it locally with up to now 230 test runs. > JobManagerHAProcessFailureRecoveryITCase failed due to job not being found > -- > > Key: FLINK-31168 > URL: https://issues.apache.org/jira/browse/FLINK-31168 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.3, 1.16.1, 1.18.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Blocker > Labels: test-stability > Fix For: 1.18.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706 > We see this build failure because a job couldn't be found: > {code} > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:319) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:958) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase.testJobManagerFailure(JobManagerHAProcessFailureRecoveryITCase.java:235) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase$4.run(JobManagerHAProcessFailureRecoveryITCase.java:336) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056) > ... 4 more > Caused by: java.lang.RuntimeException: Error while waiting for job to be > initialized > at > org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished(ClientUtils.java:160) > at > org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.lambda$execute$2(AbstractSessionClusterExecutor.java:82) > at > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73) > at > java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642) > at > java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479) > at > java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) > at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) > at > java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) > at > java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) > at > java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) > Caused by: java.util.concurrent.ExecutionException: > org.apache.flink.runtime.rest.util.RestClientException: > [org.apache.flink.runtime.rest.NotFoundException: Job > 865dcd87f4828dbeb3d93eb52e2636b1 not found > at > org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$1(AbstractExecutionGraphHandler.java:99) > at > java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) > at > java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at > java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > at > org.apache.flink.runtime.rest.handler.legacy.DefaultExecutionGraphCache.lambda$getExecutionGraphInternal$0(DefaultExecutionGraphCache.java:109) > at > java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) > at > java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at > java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > at >
[jira] [Comment Edited] (FLINK-31168) JobManagerHAProcessFailureRecoveryITCase failed due to job not being found
[ https://issues.apache.org/jira/browse/FLINK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748533#comment-17748533 ] Matthias Pohl edited comment on FLINK-31168 at 7/28/23 9:49 AM: It appears that the leader election is not causing the problem: The JobGraph is written properly to the JobGraphStore under {{flink/default/jobgraphs}} in the JM run #0 (see [log line in build #51299|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9106]): {code} Jul 17 01:09:58 01:09:47,954 10630 [flink-akka.actor.default-dispatcher-20] INFO org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Added JobGraph(jobId: 10ea837b5ff5916e57e0f49298d952d3) to ZooKeeperStateHandleStore{namespace='flink/default/jobgraphs'}. {code} A leader is picked in JM run #1 that tries to recover the data from the correct ZNode {{flink/default/jobgraphs}} (see [log line in build #51299|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9917]). But it appears that the ZNode is empty: {code} Jul 17 01:09:59 01:09:55,657 6184 [cluster-io-thread-2] INFO org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job ids [] from ZooKeeperStateHandleStore{namespace='flink/default/jobgraphs'} Jul 17 01:09:59 01:09:55,657 6184 [cluster-io-thread-2] INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Successfully recovered 0 persisted job graphs. {code} was (Author: mapohl): It appears that the leader election is not causing the problem: The JobGraph is written properly to the JobGraphStore under {{flink/default/jobgraphs}} in the JM run #0 (see [log line in build #51299|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9106]): {code} INFO org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Added JobGraph(jobId: 10ea837b5ff5916e57e0f49298d952d3) to ZooKeeperStateHandleStore{namespace='flink/default/jobgraphs'}. {code} A leader is picked in JM run #1 that tries to recover the data from the correct ZNode {{flink/default/jobgraphs}} (see [log line in build #51299|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9917]). But it appears that the ZNode is empty: {code} Jul 17 01:09:59 01:09:55,657 6184 [cluster-io-thread-2] INFO org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job ids [] from ZooKeeperStateHandleStore{namespace='flink/default/jobgraphs'} Jul 17 01:09:59 01:09:55,657 6184 [cluster-io-thread-2] INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Successfully recovered 0 persisted job graphs. {code} > JobManagerHAProcessFailureRecoveryITCase failed due to job not being found > -- > > Key: FLINK-31168 > URL: https://issues.apache.org/jira/browse/FLINK-31168 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.3, 1.16.1, 1.18.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Blocker > Labels: test-stability > Fix For: 1.18.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706 > We see this build failure because a job couldn't be found: > {code} > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:319) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:958) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase.testJobManagerFailure(JobManagerHAProcessFailureRecoveryITCase.java:235) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase$4.run(JobManagerHAProcessFailureRecoveryITCase.java:336) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > at >
[jira] [Comment Edited] (FLINK-31168) JobManagerHAProcessFailureRecoveryITCase failed due to job not being found
[ https://issues.apache.org/jira/browse/FLINK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748533#comment-17748533 ] Matthias Pohl edited comment on FLINK-31168 at 7/28/23 9:48 AM: It appears that the leader election is not causing the problem: The JobGraph is written properly to the JobGraphStore under {{flink/default/jobgraphs}} in the JM run #0 (see [log line in build #51299|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9106]): {quote} INFO org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Added JobGraph(jobId: 10ea837b5ff5916e57e0f49298d952d3) to ZooKeeperStateHandleStore{namespace='flink/default/jobgraphs'}. {quote} A leader is picked in JM run #1 that tries to recover the data from the correct ZNode {{flink/default/jobgraphs}} (see [log line in build #51299|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9917]). But it appears that the ZNode is empty: {quote} Jul 17 01:09:59 01:09:55,657 6184 [cluster-io-thread-2] INFO org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job ids [] from ZooKeeperStateHandleStore{namespace='flink/default/jobgraphs'} Jul 17 01:09:59 01:09:55,657 6184 [cluster-io-thread-2] INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Successfully recovered 0 persisted job graphs. {quote} was (Author: mapohl): It appears that the leader election is not causing the problem: The JobGraph is written properly to the JobGraphStore under {{flink/default/jobgraphs}} in the JM run #0 (see [log line in build #51299|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9106]) and a leader is picked in JM run #1 that tries to recover the data from the correct ZNode {{flink/default/jobgraphs}} (see [log line in build #51299|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9917]). But it appears that the ZNode is empty. > JobManagerHAProcessFailureRecoveryITCase failed due to job not being found > -- > > Key: FLINK-31168 > URL: https://issues.apache.org/jira/browse/FLINK-31168 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.3, 1.16.1, 1.18.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Blocker > Labels: test-stability > Fix For: 1.18.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706 > We see this build failure because a job couldn't be found: > {code} > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:319) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:958) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase.testJobManagerFailure(JobManagerHAProcessFailureRecoveryITCase.java:235) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase$4.run(JobManagerHAProcessFailureRecoveryITCase.java:336) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056) > ... 4 more > Caused by: java.lang.RuntimeException: Error while waiting for job to be > initialized > at > org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished(ClientUtils.java:160) > at > org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.lambda$execute$2(AbstractSessionClusterExecutor.java:82) > at > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73) > at >
[jira] [Comment Edited] (FLINK-31168) JobManagerHAProcessFailureRecoveryITCase failed due to job not being found
[ https://issues.apache.org/jira/browse/FLINK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748533#comment-17748533 ] Matthias Pohl edited comment on FLINK-31168 at 7/28/23 9:48 AM: It appears that the leader election is not causing the problem: The JobGraph is written properly to the JobGraphStore under {{flink/default/jobgraphs}} in the JM run #0 (see [log line in build #51299|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9106]): {code} INFO org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Added JobGraph(jobId: 10ea837b5ff5916e57e0f49298d952d3) to ZooKeeperStateHandleStore{namespace='flink/default/jobgraphs'}. {code} A leader is picked in JM run #1 that tries to recover the data from the correct ZNode {{flink/default/jobgraphs}} (see [log line in build #51299|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9917]). But it appears that the ZNode is empty: {code} Jul 17 01:09:59 01:09:55,657 6184 [cluster-io-thread-2] INFO org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job ids [] from ZooKeeperStateHandleStore{namespace='flink/default/jobgraphs'} Jul 17 01:09:59 01:09:55,657 6184 [cluster-io-thread-2] INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Successfully recovered 0 persisted job graphs. {code} was (Author: mapohl): It appears that the leader election is not causing the problem: The JobGraph is written properly to the JobGraphStore under {{flink/default/jobgraphs}} in the JM run #0 (see [log line in build #51299|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9106]): {quote} INFO org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Added JobGraph(jobId: 10ea837b5ff5916e57e0f49298d952d3) to ZooKeeperStateHandleStore{namespace='flink/default/jobgraphs'}. {quote} A leader is picked in JM run #1 that tries to recover the data from the correct ZNode {{flink/default/jobgraphs}} (see [log line in build #51299|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9917]). But it appears that the ZNode is empty: {quote} Jul 17 01:09:59 01:09:55,657 6184 [cluster-io-thread-2] INFO org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job ids [] from ZooKeeperStateHandleStore{namespace='flink/default/jobgraphs'} Jul 17 01:09:59 01:09:55,657 6184 [cluster-io-thread-2] INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - Successfully recovered 0 persisted job graphs. {quote} > JobManagerHAProcessFailureRecoveryITCase failed due to job not being found > -- > > Key: FLINK-31168 > URL: https://issues.apache.org/jira/browse/FLINK-31168 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.3, 1.16.1, 1.18.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Blocker > Labels: test-stability > Fix For: 1.18.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706 > We see this build failure because a job couldn't be found: > {code} > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:319) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:958) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase.testJobManagerFailure(JobManagerHAProcessFailureRecoveryITCase.java:235) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase$4.run(JobManagerHAProcessFailureRecoveryITCase.java:336) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
[jira] [Commented] (FLINK-31168) JobManagerHAProcessFailureRecoveryITCase failed due to job not being found
[ https://issues.apache.org/jira/browse/FLINK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748533#comment-17748533 ] Matthias Pohl commented on FLINK-31168: --- It appears that the leader election is not causing the problem: The JobGraph is written properly to the JobGraphStore under {{flink/default/jobgraphs}} in the JM run #0 (see [log line in build #51299|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9106]) and a leader is picked in JM run #1 that tries to recover the data from the correct ZNode {{flink/default/jobgraphs}} (see [log line in build #51299|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9917]). But it appears that the ZNode is empty. > JobManagerHAProcessFailureRecoveryITCase failed due to job not being found > -- > > Key: FLINK-31168 > URL: https://issues.apache.org/jira/browse/FLINK-31168 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.3, 1.16.1, 1.18.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Blocker > Labels: test-stability > Fix For: 1.18.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706 > We see this build failure because a job couldn't be found: > {code} > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:319) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:958) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase.testJobManagerFailure(JobManagerHAProcessFailureRecoveryITCase.java:235) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase$4.run(JobManagerHAProcessFailureRecoveryITCase.java:336) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056) > ... 4 more > Caused by: java.lang.RuntimeException: Error while waiting for job to be > initialized > at > org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished(ClientUtils.java:160) > at > org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.lambda$execute$2(AbstractSessionClusterExecutor.java:82) > at > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73) > at > java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642) > at > java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479) > at > java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) > at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) > at > java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) > at > java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) > at > java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) > Caused by: java.util.concurrent.ExecutionException: > org.apache.flink.runtime.rest.util.RestClientException: > [org.apache.flink.runtime.rest.NotFoundException: Job > 865dcd87f4828dbeb3d93eb52e2636b1 not found > at > org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$1(AbstractExecutionGraphHandler.java:99) > at > java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) > at > java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at >
[jira] [Created] (FLINK-32710) The LeaderElection component IDs for running is only the JobID which might be confusing in the log output
Matthias Pohl created FLINK-32710: - Summary: The LeaderElection component IDs for running is only the JobID which might be confusing in the log output Key: FLINK-32710 URL: https://issues.apache.org/jira/browse/FLINK-32710 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.18.0 Reporter: Matthias Pohl I noticed that the leader log messages for the jobs use the plain job ID as the component ID. That might be confusing when reading the logs since it's a UUID with no additional context. We might want to add a prefix (e.g. {{job-}} to these component IDs.) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32710) The LeaderElection component IDs for running is only the JobID which might be confusing in the log output
[ https://issues.apache.org/jira/browse/FLINK-32710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-32710: -- Issue Type: Improvement (was: Bug) > The LeaderElection component IDs for running is only the JobID which might be > confusing in the log output > - > > Key: FLINK-32710 > URL: https://issues.apache.org/jira/browse/FLINK-32710 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.18.0 >Reporter: Matthias Pohl >Priority: Minor > Labels: starter > > I noticed that the leader log messages for the jobs use the plain job ID as > the component ID. That might be confusing when reading the logs since it's a > UUID with no additional context. > We might want to add a prefix (e.g. {{job-}} to these component IDs.) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32710) The LeaderElection component IDs for running is only the JobID which might be confusing in the log output
[ https://issues.apache.org/jira/browse/FLINK-32710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-32710: -- Labels: starter (was: ) > The LeaderElection component IDs for running is only the JobID which might be > confusing in the log output > - > > Key: FLINK-32710 > URL: https://issues.apache.org/jira/browse/FLINK-32710 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.18.0 >Reporter: Matthias Pohl >Priority: Minor > Labels: starter > > I noticed that the leader log messages for the jobs use the plain job ID as > the component ID. That might be confusing when reading the logs since it's a > UUID with no additional context. > We might want to add a prefix (e.g. {{job-}} to these component IDs.) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] 1996fanrui commented on pull request #23094: [FLINK-32705][connectors/common] Remove the beta for watermark alignment
1996fanrui commented on PR #23094: URL: https://github.com/apache/flink/pull/23094#issuecomment-1655369644 The actual CI passed, and the uploading log and caching docker image failed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-31168) JobManagerHAProcessFailureRecoveryITCase failed due to job not being found
[ https://issues.apache.org/jira/browse/FLINK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748491#comment-17748491 ] Matthias Pohl edited comment on FLINK-31168 at 7/28/23 9:20 AM: The most-recent failures seem to have been caused by the job recovery not being successful: * [20230711.1 (#51165) Dispatcher #1 output in line 8688|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51165=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=8688] * [20230717.1 (#51299) Dispatcher #1 output in line 9918|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9918] I'm gonna raise the priority of this issue to blocker because it could be related to the leader election changes. The job is not picked up anymore and therefore, cannot be saved in the {{ExecutionGraphInfoStore}}. The job client will wait for the initialization phase to be over and then requests the JobResult which calls {{Dispatcher.requestJobStatus}}. {{requestJobStatus}} won't find a {{JobManagerRunner}} in {{Dispatcher#jobManagerRunnerRegistry}} and non in the {{Dispatcher#executionGraphInfoStore}} (see [Dispatcher#requestJobStatus|https://github.com/apache/flink/blob/ab9445aca56e7d139e8fd9bcc23e5f7e06288e66/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L902]). Therefore, the response future will complete exceptionally with a {{FlinkJobNotFoundException}} causing the error which we're seeing in the last two CI failures. was (Author: mapohl): The most-recent failures seem to have been caused by the job recovery not being successful: * [20230711.1 (#51165) Dispatcher #1 output in line 8688|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51165=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=8688] * [20230717.1 (#51299) Dispatcher #1 output in line 9918|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9918] I'm gonna raise the priority of this issue to blocker because it could be related to the leader election changes. The job is not picked up anymore and therefore, cannot be saved in the {{ExecutionGraphInfoStore}}. The job client will wait for the initialization phase to be over and then requests the JobResult which calls {{Dispatcher.requestJobStatus}}. {{requestJobStatus}} won't find a {{JobManagerRunner}} in {{Dispatcher#jobManagerRunnerRegistry}} and non in the {{Dispatcher#executionGraphInfoStore}} (see [Dispatcher#requestJobStatus|https://github.com/apache/flink/blob/ab9445aca56e7d139e8fd9bcc23e5f7e06288e66/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L902]. Therefore, the response future will complete exceptionally with a {{FlinkJobNotFoundException}} causing the error which we're seeing in the last two CI failures. > JobManagerHAProcessFailureRecoveryITCase failed due to job not being found > -- > > Key: FLINK-31168 > URL: https://issues.apache.org/jira/browse/FLINK-31168 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.3, 1.16.1, 1.18.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Blocker > Labels: test-stability > Fix For: 1.18.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706 > We see this build failure because a job couldn't be found: > {code} > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:319) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:958) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase.testJobManagerFailure(JobManagerHAProcessFailureRecoveryITCase.java:235) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase$4.run(JobManagerHAProcessFailureRecoveryITCase.java:336) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at >
[jira] [Comment Edited] (FLINK-31168) JobManagerHAProcessFailureRecoveryITCase failed due to job not being found
[ https://issues.apache.org/jira/browse/FLINK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748491#comment-17748491 ] Matthias Pohl edited comment on FLINK-31168 at 7/28/23 9:20 AM: The most-recent failures seem to have been caused by the job recovery not being successful: * [20230711.1 (#51165) Dispatcher #1 output in line 8688|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51165=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=8688] * [20230717.1 (#51299) Dispatcher #1 output in line 9918|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9918] I'm gonna raise the priority of this issue to blocker because it could be related to the leader election changes. The job is not picked up anymore and therefore, cannot be saved in the {{ExecutionGraphInfoStore}}. The job client will wait for the initialization phase to be over and then requests the JobResult which calls {{Dispatcher.requestJobStatus}}. {{requestJobStatus}} won't find a {{JobManagerRunner}} in {{Dispatcher#jobManagerRunnerRegistry}} and non in the {{Dispatcher#executionGraphInfoStore}} (see [Dispatcher#requestJobStatus|https://github.com/apache/flink/blob/ab9445aca56e7d139e8fd9bcc23e5f7e06288e66/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/Dispatcher.java#L902]. Therefore, the response future will complete exceptionally with a {{FlinkJobNotFoundException}} causing the error which we're seeing in the last two CI failures. was (Author: mapohl): The most-recent failures seem to have been caused by the job recovery not being successful: * [20230711.1 (#51165) Dispatcher #1 output in line 8688|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51165=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=8688] * [20230717.1 (#51299) Dispatcher #1 output in line 9918|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9918] I'm gonna raise the priority of this issue to blocker because it could be related to the leader election changes. > JobManagerHAProcessFailureRecoveryITCase failed due to job not being found > -- > > Key: FLINK-31168 > URL: https://issues.apache.org/jira/browse/FLINK-31168 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.3, 1.16.1, 1.18.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Blocker > Labels: test-stability > Fix For: 1.18.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706 > We see this build failure because a job couldn't be found: > {code} > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:319) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:958) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase.testJobManagerFailure(JobManagerHAProcessFailureRecoveryITCase.java:235) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase$4.run(JobManagerHAProcessFailureRecoveryITCase.java:336) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056) > ... 4 more > Caused by: java.lang.RuntimeException: Error while waiting for job to be > initialized > at > org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished(ClientUtils.java:160) > at > org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.lambda$execute$2(AbstractSessionClusterExecutor.java:82) > at > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73) > at >
[jira] [Updated] (FLINK-32708) Fix the write logic in remote tier of Hybrid Shuffle
[ https://issues.apache.org/jira/browse/FLINK-32708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wencong Liu updated FLINK-32708: Summary: Fix the write logic in remote tier of Hybrid Shuffle (was: Fix the write logic in remote tier of hybrid shuffle) > Fix the write logic in remote tier of Hybrid Shuffle > > > Key: FLINK-32708 > URL: https://issues.apache.org/jira/browse/FLINK-32708 > Project: Flink > Issue Type: Bug > Components: Runtime / Network >Affects Versions: 1.18.0 >Reporter: Wencong Liu >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0 > > > Currently, on the writer side in the remote tier, the flag file indicating > the latest segment id is updated first, followed by the creation of the data > file. This results in an incorrect order of file creation and we should fix > it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink-connector-hbase] MartijnVisser commented on pull request #17: [FLINK-28013] Shade all netty dependencies in sql jar
MartijnVisser commented on PR #17: URL: https://github.com/apache/flink-connector-hbase/pull/17#issuecomment-1655299963 > this modification is to add netty packages other than 'netty-all', such as 'netty-transport' here, to the sql jar. Ah sorry, I misread. I'll try to have a look later today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-31168) JobManagerHAProcessFailureRecoveryITCase failed due to job not being found
[ https://issues.apache.org/jira/browse/FLINK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-31168: -- Fix Version/s: 1.18.0 > JobManagerHAProcessFailureRecoveryITCase failed due to job not being found > -- > > Key: FLINK-31168 > URL: https://issues.apache.org/jira/browse/FLINK-31168 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.3, 1.16.1, 1.18.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Major > Labels: test-stability > Fix For: 1.18.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706 > We see this build failure because a job couldn't be found: > {code} > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:319) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:958) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase.testJobManagerFailure(JobManagerHAProcessFailureRecoveryITCase.java:235) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase$4.run(JobManagerHAProcessFailureRecoveryITCase.java:336) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056) > ... 4 more > Caused by: java.lang.RuntimeException: Error while waiting for job to be > initialized > at > org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished(ClientUtils.java:160) > at > org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.lambda$execute$2(AbstractSessionClusterExecutor.java:82) > at > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73) > at > java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642) > at > java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479) > at > java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) > at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) > at > java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) > at > java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) > at > java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) > Caused by: java.util.concurrent.ExecutionException: > org.apache.flink.runtime.rest.util.RestClientException: > [org.apache.flink.runtime.rest.NotFoundException: Job > 865dcd87f4828dbeb3d93eb52e2636b1 not found > at > org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$1(AbstractExecutionGraphHandler.java:99) > at > java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) > at > java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at > java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > at > org.apache.flink.runtime.rest.handler.legacy.DefaultExecutionGraphCache.lambda$getExecutionGraphInternal$0(DefaultExecutionGraphCache.java:109) > at > java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) > at > java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at > java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > at >
[jira] [Updated] (FLINK-31168) JobManagerHAProcessFailureRecoveryITCase failed due to job not being found
[ https://issues.apache.org/jira/browse/FLINK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Pohl updated FLINK-31168: -- Priority: Blocker (was: Major) > JobManagerHAProcessFailureRecoveryITCase failed due to job not being found > -- > > Key: FLINK-31168 > URL: https://issues.apache.org/jira/browse/FLINK-31168 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.3, 1.16.1, 1.18.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Blocker > Labels: test-stability > Fix For: 1.18.0 > > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706 > We see this build failure because a job couldn't be found: > {code} > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:319) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:958) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase.testJobManagerFailure(JobManagerHAProcessFailureRecoveryITCase.java:235) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase$4.run(JobManagerHAProcessFailureRecoveryITCase.java:336) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056) > ... 4 more > Caused by: java.lang.RuntimeException: Error while waiting for job to be > initialized > at > org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished(ClientUtils.java:160) > at > org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.lambda$execute$2(AbstractSessionClusterExecutor.java:82) > at > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73) > at > java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642) > at > java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479) > at > java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) > at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) > at > java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) > at > java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) > at > java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) > Caused by: java.util.concurrent.ExecutionException: > org.apache.flink.runtime.rest.util.RestClientException: > [org.apache.flink.runtime.rest.NotFoundException: Job > 865dcd87f4828dbeb3d93eb52e2636b1 not found > at > org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$1(AbstractExecutionGraphHandler.java:99) > at > java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) > at > java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at > java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > at > org.apache.flink.runtime.rest.handler.legacy.DefaultExecutionGraphCache.lambda$getExecutionGraphInternal$0(DefaultExecutionGraphCache.java:109) > at > java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) > at > java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at > java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > at >
[jira] [Commented] (FLINK-31168) JobManagerHAProcessFailureRecoveryITCase failed due to job not being found
[ https://issues.apache.org/jira/browse/FLINK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748491#comment-17748491 ] Matthias Pohl commented on FLINK-31168: --- The most-recent failures seem to have been caused by the job recovery not being successful: * [20230711.1 (#51165) Dispatcher #1 output in line 8688|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51165=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=8688] * [20230717.1 (#51299) Dispatcher #1 output in line 9918|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9918] I'm gonna raise the priority of this issue to blocker because it could be related to the leader election changes. > JobManagerHAProcessFailureRecoveryITCase failed due to job not being found > -- > > Key: FLINK-31168 > URL: https://issues.apache.org/jira/browse/FLINK-31168 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.3, 1.16.1, 1.18.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Major > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706 > We see this build failure because a job couldn't be found: > {code} > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:319) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:958) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase.testJobManagerFailure(JobManagerHAProcessFailureRecoveryITCase.java:235) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase$4.run(JobManagerHAProcessFailureRecoveryITCase.java:336) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056) > ... 4 more > Caused by: java.lang.RuntimeException: Error while waiting for job to be > initialized > at > org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished(ClientUtils.java:160) > at > org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.lambda$execute$2(AbstractSessionClusterExecutor.java:82) > at > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73) > at > java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642) > at > java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479) > at > java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) > at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) > at > java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) > at > java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) > at > java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) > Caused by: java.util.concurrent.ExecutionException: > org.apache.flink.runtime.rest.util.RestClientException: > [org.apache.flink.runtime.rest.NotFoundException: Job > 865dcd87f4828dbeb3d93eb52e2636b1 not found > at > org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$1(AbstractExecutionGraphHandler.java:99) > at > java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) > at > java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at > java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > at >
[jira] [Assigned] (FLINK-32667) Use standalone store and embedding writer for jobs with no-restart-strategy in session cluster
[ https://issues.apache.org/jira/browse/FLINK-32667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang Yong reassigned FLINK-32667: - Assignee: Fang Yong > Use standalone store and embedding writer for jobs with no-restart-strategy > in session cluster > -- > > Key: FLINK-32667 > URL: https://issues.apache.org/jira/browse/FLINK-32667 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.18.0 >Reporter: Fang Yong >Assignee: Fang Yong >Priority: Major > Labels: pull-request-available > > When a flink session cluster use zk or k8s high availability service, it will > store jobs in zk or ConfigMap. When we submit flink olap jobs to the session > cluster, they always turn off restart strategy. These jobs with > no-restart-strategy should not be stored in zk or ConfigMap in k8s -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] flinkbot commented on pull request #23101: [FLINK-32667][dispatcher] Do not save job without restart strategy to job writer and store
flinkbot commented on PR #23101: URL: https://github.com/apache/flink/pull/23101#issuecomment-1655290894 ## CI report: * 9e88a35aafee8645ebbabb3225b24fa8052b4bcd UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #23100: [FLINK-32708][network] Fix the write logic in remote tier of hybrid shuffle
flinkbot commented on PR #23100: URL: https://github.com/apache/flink/pull/23100#issuecomment-1655282000 ## CI report: * 30bae68215a49365875c956eadde2a36929f0aea UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-32667) Use standalone store and embedding writer for jobs with no-restart-strategy in session cluster
[ https://issues.apache.org/jira/browse/FLINK-32667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-32667: --- Labels: pull-request-available (was: ) > Use standalone store and embedding writer for jobs with no-restart-strategy > in session cluster > -- > > Key: FLINK-32667 > URL: https://issues.apache.org/jira/browse/FLINK-32667 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.18.0 >Reporter: Fang Yong >Priority: Major > Labels: pull-request-available > > When a flink session cluster use zk or k8s high availability service, it will > store jobs in zk or ConfigMap. When we submit flink olap jobs to the session > cluster, they always turn off restart strategy. These jobs with > no-restart-strategy should not be stored in zk or ConfigMap in k8s -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] FangYongs opened a new pull request, #23101: [FLINK-32667][dispatcher] Do not save job without restart strategy to job writer and store
FangYongs opened a new pull request, #23101: URL: https://github.com/apache/flink/pull/23101 ## What is the purpose of the change This PR aims to not save job which without restart strategy to job writer and store. In olap scenario, jobs always has no restart strategy. It will increase latency if the jobs are saved to zookeeper or configmap for k8s. It is not necessary for the jobs without restart strategy to save these information. ## Brief change log - Add DispatcherJobStorage for Dispatcher to manage job writer and store - Register job to DispatcherJobStorage if it has no restart strategy - Do not save job information if job has no restart strategy - Unregister job in DispatcherJobStorage when job reaches termination ## Verifying this change This change added tests and can be verified as follows: - Added `testCleanupWhenJobFinishedWithNoRestart` and `testCleanupWhenJobCanceledWithNoRestart` to test register and unregister in Dispatcher ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / no) no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / no) no - The serializers: (yes / no / don't know) no - The runtime per-record code paths (performance sensitive): (yes / no / don't know) no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know) no - The S3 file system connector: (yes / no / don't know) no ## Documentation - Does this pull request introduce a new feature? (yes / no) no - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-32708) Fix the write logic in remote tier of hybrid shuffle
[ https://issues.apache.org/jira/browse/FLINK-32708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-32708: --- Labels: pull-request-available (was: ) > Fix the write logic in remote tier of hybrid shuffle > > > Key: FLINK-32708 > URL: https://issues.apache.org/jira/browse/FLINK-32708 > Project: Flink > Issue Type: Bug > Components: Runtime / Network >Affects Versions: 1.18.0 >Reporter: Wencong Liu >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0 > > > Currently, on the writer side in the remote tier, the flag file indicating > the latest segment id is updated first, followed by the creation of the data > file. This results in an incorrect order of file creation and we should fix > it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] WencongLiu opened a new pull request, #23100: [FLINK-32708][network] Fix the write logic in remote tier of hybrid shuffle
WencongLiu opened a new pull request, #23100: URL: https://github.com/apache/flink/pull/23100 ## What is the purpose of the change *Fix the write logic in remote tier of hybrid shuffle.* ## Brief change log - *Fix the write logic in remote tier of hybrid shuffle.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-32709) Improve memory utilization for Hybrid Shuffle
[ https://issues.apache.org/jira/browse/FLINK-32709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuxin Tan updated FLINK-32709: -- Summary: Improve memory utilization for Hybrid Shuffle (was: Modify segment size to improve memory utilization for Hybrid Shuffle) > Improve memory utilization for Hybrid Shuffle > - > > Key: FLINK-32709 > URL: https://issues.apache.org/jira/browse/FLINK-32709 > Project: Flink > Issue Type: Bug > Components: Runtime / Network >Affects Versions: 1.18.0 >Reporter: Yuxin Tan >Assignee: Yuxin Tan >Priority: Major > Labels: pull-request-available > > Currently, each subpartition in Disk/Remote has a segment size of 8M. When > writing segments to the Disk tier with a parallelism of 1000, only shuffle > data exceeding 1000 * 8M can be written to the Memory tier again. However, > for most shuffles, the data volume size falls below this limit, significantly > impacting Memory tier utilization. > For better performance, it is necessary to address this issue to improve the > memory tier utilization. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] LadyForest closed pull request #23065: [FLINK-32656][table] Deprecate ManagedTable related APIs
LadyForest closed pull request #23065: [FLINK-32656][table] Deprecate ManagedTable related APIs URL: https://github.com/apache/flink/pull/23065 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #23099: [FLINK-32583][rest] Fix deadlock in RestClient (backport to 1.16)
flinkbot commented on PR #23099: URL: https://github.com/apache/flink/pull/23099#issuecomment-1655250466 ## CI report: * eb0bbb685ade3539ddf1ad08755b2db9be306ec3 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-connector-hbase] whhe commented on pull request #17: [FLINK-28013] Shade all netty dependencies in sql jar
whhe commented on PR #17: URL: https://github.com/apache/flink-connector-hbase/pull/17#issuecomment-1655247084 > @whhe The connector actually shouldn't rely on `flink-shaded` at all. It will not depend on 'flink-shaded', this modification is to add netty packages other than 'netty-all', such as 'netty-transport' here, to the sql jar. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-31168) JobManagerHAProcessFailureRecoveryITCase failed due to job not being found
[ https://issues.apache.org/jira/browse/FLINK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748445#comment-17748445 ] Matthias Pohl edited comment on FLINK-31168 at 7/28/23 8:03 AM: For the record: The reported cases do not always have the same cause. ||Build||Branch||Comment||Description|| |[20230221.2 (#46342)|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706]|release-1.15 (c6b649bf)|Jira issue description|Job was not found during initialization phase of the job| |[20221014.20 (#42038)|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=42038=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba=9922]|1.16 PR|1st report in comments|Error after job finished| |[20232504.1 (#48442)|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48442=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12170]|master (1.18)|2nd report in comments|Error after job finished| |[20230629.1 (#50607)|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=50607=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9363]|master (74ae4b24)|3rd report in comments|Ask timeout| |[20230711.1 (#51165)|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51165=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9898]|master (4cf2124d)|4th report in comments|Error while restarting the dispatcher (job not finished)| |[20230717.1 (#51299)|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=10039]|master (4690dc29)|5th report in comments|Error while restarting the dispatcher (job not finished)| [20230711.1 (#51165)|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51165=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9898] is the first failure that included the flink-shaded 17.0 upgrade (FLINK-32032). was (Author: mapohl): For the record: The reported cases do not always have the same cause. ||Build||Branch||Comment||Description|| |[20230221.2 (#46342)|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706]|release-1.15 (c6b649bf)|Jira issue description|Job was not found during initialization phase of the job| |[20221014.20 (#42038)|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=42038=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba=9922]|1.16 PR|1st report in comments|Error after job finished| |[20232504.1 (#48442)|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48442=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12170]|master (1.18)|2nd report in comments|Error after job finished| |[20230629.1 (#50607)|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=50607=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9363]|master (74ae4b24)|3rd report in comments|Ask timeout| |[20230711.1 (#51165)|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51165=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9898]|master (4cf2124d)|4th report in comments|Error while restarting the dispatcher (job not finished)| |[20230717.1 (#51299)|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=10039]|master (4690dc29)|5th report in comments|Error while restarting the dispatcher (job not finished)| > JobManagerHAProcessFailureRecoveryITCase failed due to job not being found > -- > > Key: FLINK-31168 > URL: https://issues.apache.org/jira/browse/FLINK-31168 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.3, 1.16.1, 1.18.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Major > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706 > We see this build failure because a job couldn't be found: > {code} > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at
[jira] [Comment Edited] (FLINK-31168) JobManagerHAProcessFailureRecoveryITCase failed due to job not being found
[ https://issues.apache.org/jira/browse/FLINK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748445#comment-17748445 ] Matthias Pohl edited comment on FLINK-31168 at 7/28/23 7:37 AM: For the record: The reported cases do not always have the same cause. ||Build||Branch||Comment||Description|| |[20230221.2 (#46342)|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706]|release-1.15 (c6b649bf)|Jira issue description|Job was not found during initialization phase of the job| |[20221014.20 (#42038)|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=42038=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba=9922]|1.16 PR|1st report in comments|Error after job finished| |[20232504.1 (#48442)|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48442=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12170]|master (1.18)|2nd report in comments|Error after job finished| |[20230629.1 (#50607)|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=50607=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9363]|master (74ae4b24)|3rd report in comments|Ask timeout| |[20230711.1 (#51165)|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51165=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9898]|master (4cf2124d)|4th report in comments|Error while restarting the dispatcher (job not finished)| |[20230717.1 (#51299)|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=10039]|master (4690dc29)|5th report in comments|Error while restarting the dispatcher (job not finished)| was (Author: mapohl): For the record: The reported cases do not always have the same cause. ||Build||Branch||Comment||Description|| |[20230221.2|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706]|release-1.15 (c6b649bf)|Jira issue description|Job was not found during initialization phase of the job| |[20221014.20|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=42038=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba=9922]|1.16 PR|1st report in comments|Error after job finished| |[20232504.1|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48442=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12170]|master (1.18)|2nd report in comments|Error after job finished| |[20230629.1|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=50607=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9363]|master (74ae4b24)|3rd report in comments|Ask timeout| |[20230711.1|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51165=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9898]|master (4cf2124d)|4th report in comments|Error while restarting the dispatcher (job not finished)| |[20230717.1|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=10039]|master (4690dc29)|5th report in comments|Error while restarting the dispatcher (job not finished)| > JobManagerHAProcessFailureRecoveryITCase failed due to job not being found > -- > > Key: FLINK-31168 > URL: https://issues.apache.org/jira/browse/FLINK-31168 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.3, 1.16.1, 1.18.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Major > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706 > We see this build failure because a job couldn't be found: > {code} > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:319) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:958) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942)
[jira] [Commented] (FLINK-31168) JobManagerHAProcessFailureRecoveryITCase failed due to job not being found
[ https://issues.apache.org/jira/browse/FLINK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748445#comment-17748445 ] Matthias Pohl commented on FLINK-31168: --- For the record: The reported cases do not always have the same cause. ||Build||Branch||Comment||Description|| |[20230221.2|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706]|release-1.15 (c6b649bf)|Jira issue description|Job was not found during initialization phase of the job| |[20221014.20|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=42038=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=86f654fa-ab48-5c1a-25f4-7e7f6afb9bba=9922]|1.16 PR|1st report in comments|Error after job finished| |[20232504.1|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48442=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12170]|master (1.18)|2nd report in comments|Error after job finished| |[20230629.1|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=50607=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9363]|master (74ae4b24)|3rd report in comments|Ask timeout| |[20230711.1|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51165=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=9898]|master (4cf2124d)|4th report in comments|Error while restarting the dispatcher (job not finished)| |[20230717.1|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=51299=logs=a596f69e-60d2-5a4b-7d39-dc69e4cdaed3=712ade8c-ca16-5b76-3acd-14df33bc1cb1=10039]|master (4690dc29)|5th report in comments|Error while restarting the dispatcher (job not finished)| > JobManagerHAProcessFailureRecoveryITCase failed due to job not being found > -- > > Key: FLINK-31168 > URL: https://issues.apache.org/jira/browse/FLINK-31168 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.3, 1.16.1, 1.18.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Major > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706 > We see this build failure because a job couldn't be found: > {code} > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:319) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:958) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase.testJobManagerFailure(JobManagerHAProcessFailureRecoveryITCase.java:235) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase$4.run(JobManagerHAProcessFailureRecoveryITCase.java:336) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056) > ... 4 more > Caused by: java.lang.RuntimeException: Error while waiting for job to be > initialized > at > org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished(ClientUtils.java:160) > at > org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.lambda$execute$2(AbstractSessionClusterExecutor.java:82) > at > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73) > at > java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642) > at > java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479) > at > java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) > at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) > at > java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) > at >
[GitHub] [flink-connector-pulsar] tisonkun commented on pull request #56: [FLINK-26203] Basic table factory for Pulsar connector
tisonkun commented on PR #56: URL: https://github.com/apache/flink-connector-pulsar/pull/56#issuecomment-1655131338 Support keybytes for now to minimize code change. Following work - 1. Add SQL client E2E test 2. Add upsert table factory 3. Revisit key serialization support. I know that users can have biz data in keys, but I don't know if it's common. @leonardBang @PatrickRen does Kafka supports mapping message key to Flink SQL columns? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-31168) JobManagerHAProcessFailureRecoveryITCase failed due to job not being found
[ https://issues.apache.org/jira/browse/FLINK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748254#comment-17748254 ] Matthias Pohl edited comment on FLINK-31168 at 7/28/23 6:46 AM: The error appears in the [FileExecutionGraphInfoStore#getAvailableJobDetails(JobID)|https://github.com/apache/flink/blob/d78d52b27af2550f50b44349d3ec6dc84b966a8a/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/FileExecutionGraphInfoStore.java#L237] where the JobDetails are not cached. The {{FileSystemExecutionGraphInfoStore}} uses two caches: * {{jobDetailsCache}} holds the {{JobDetails}} of each job in cache. It has a default expiration time of 3600s (see [jobstore.expiration-time default|https://github.com/apache/flink/blob/f3598c50c0d3dcdf8058b01f13b7eb9fc5954f7c/flink-core/src/main/java/org/apache/flink/configuration/JobManagerOptions.java#L340]). The size of the cache is limited to {{Integer.MAX_VALUE}} entries (see [jobstore.max-capacity default|https://github.com/apache/flink/blob/f3598c50c0d3dcdf8058b01f13b7eb9fc5954f7c/flink-core/src/main/java/org/apache/flink/configuration/JobManagerOptions.java#L350]). Removing entries from this cache will trigger the removal of the file and invalidates the corresponding entries in both caches. Removal can happen even earlier, though * {{executionGraphInfoCache}} holds the {{ExecutionGraphInfo}} in the cache. It's limited to 50MB (see [jobstore.cache-size default|https://github.com/apache/flink/blob/f3598c50c0d3dcdf8058b01f13b7eb9fc5954f7c/flink-core/src/main/java/org/apache/flink/configuration/JobManagerOptions.java#L331]). The ExecutionGraphInfo will be reloaded from disk in case it was removed from the cache before. This error can be simulated by reducing either the expiry time of the {{jobDetailsCache}} or decreasing the entry limit of that cache. One suspicion is that the entry gets removed earlier. The [CacheBuilder#maximumSize JavaDoc|https://guava.dev/releases/19.0/api/docs/com/google/common/cache/CacheBuilder.html#maximumSize(long)] states that this can happen: {quote} Note that the cache may evict an entry before this limit is exceeded. {quote} We should investigate whether the error started to appear with the {{flink-shaded}} update (FLINK-30772, FLINK-32032) where the update might have included a change in how the cache operates. I will continue investigating this tomorrow. was (Author: mapohl): The error appears in the [FileExecutionGraphInfoStore#getAvailableJobDetails(JobID)|https://github.com/apache/flink/blob/d78d52b27af2550f50b44349d3ec6dc84b966a8a/flink-runtime/src/main/java/org/apache/flink/runtime/dispatcher/FileExecutionGraphInfoStore.java#L237] where the JobDetails are not cached. The {{FileSystemExecutionGraphInfoStore}} uses two caches: * {{jobDetailsCache}} holds the {{JobDetails}} of each job in cache. It has a default expiration time of 3600s (see [jobstore.expiration-time default|https://github.com/apache/flink/blob/f3598c50c0d3dcdf8058b01f13b7eb9fc5954f7c/flink-core/src/main/java/org/apache/flink/configuration/JobManagerOptions.java#L340]). The size of the cache is limited to {{Integer.MAX_VALUE}} entries (see [jobstore.max-capacity default|https://github.com/apache/flink/blob/f3598c50c0d3dcdf8058b01f13b7eb9fc5954f7c/flink-core/src/main/java/org/apache/flink/configuration/JobManagerOptions.java#L350]). Removing entries from this cache will trigger the removal of the file and invalidates the corresponding entries in both caches. Removal can happen even earlier, though * {{executionGraphInfoCache}} holds the {{ExecutionGraphInfo}} in the cache. It's limited to 50MB (see [jobstore.cache-size default|https://github.com/apache/flink/blob/f3598c50c0d3dcdf8058b01f13b7eb9fc5954f7c/flink-core/src/main/java/org/apache/flink/configuration/JobManagerOptions.java#L331]). The ExecutionGraphInfo will be reloaded from disk in case it was removed from the cache before. This error can be simulated by reducing either the expiry time of the {{jobDetailsCache}} or decreasing the entry limit of that cache. One suspicion is that the entry gets removed earlier. The [CacheBuilder#maximumSize JavaDoc|https://guava.dev/releases/19.0/api/docs/com/google/common/cache/CacheBuilder.html#maximumSize(long)] states that this can happen: {quote} Note that the cache may evict an entry before this limit is exceeded. {quote} We should investigate whether the error started to appear with the {{flink-shaded}} update where the update might have included a change in how the cache operates. I will continue investigating this tomorrow. > JobManagerHAProcessFailureRecoveryITCase failed due to job not being found > -- > > Key: FLINK-31168 > URL: https://issues.apache.org/jira/browse/FLINK-31168 > Project: Flink
[jira] [Commented] (FLINK-31168) JobManagerHAProcessFailureRecoveryITCase failed due to job not being found
[ https://issues.apache.org/jira/browse/FLINK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748436#comment-17748436 ] Matthias Pohl commented on FLINK-31168: --- My suspicion doesn't seem to hold: I couldn't identify any change between guava 30.1 and guava 31.1 (I checked the sources and the release notes) that would have been an explanation for different eviction behavior. Additionally, the [CacheBuilder#maximumSize JavaDoc|https://guava.dev/releases/19.0/api/docs/com/google/common/cache/CacheBuilder.html#maximumSize(long)] states that entries are only evicted if the cache is getting close to its limit (which is {{Integer.MAX_VALUE}}). The test itself only runs a single job which would lead to one entry in the cache. > JobManagerHAProcessFailureRecoveryITCase failed due to job not being found > -- > > Key: FLINK-31168 > URL: https://issues.apache.org/jira/browse/FLINK-31168 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.15.3, 1.16.1, 1.18.0 >Reporter: Matthias Pohl >Assignee: Matthias Pohl >Priority: Major > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46342=logs=b0a398c0-685b-599c-eb57-c8c2a771138e=747432ad-a576-5911-1e2a-68c6bedc248a=12706 > We see this build failure because a job couldn't be found: > {code} > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:319) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1061) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:958) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase.testJobManagerFailure(JobManagerHAProcessFailureRecoveryITCase.java:235) > at > org.apache.flink.test.recovery.JobManagerHAProcessFailureRecoveryITCase$4.run(JobManagerHAProcessFailureRecoveryITCase.java:336) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error while waiting for job to be initialized > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > at > java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > at > org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056) > ... 4 more > Caused by: java.lang.RuntimeException: Error while waiting for job to be > initialized > at > org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished(ClientUtils.java:160) > at > org.apache.flink.client.deployment.executors.AbstractSessionClusterExecutor.lambda$execute$2(AbstractSessionClusterExecutor.java:82) > at > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73) > at > java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642) > at > java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479) > at > java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) > at > java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) > at > java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) > at > java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) > at > java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) > Caused by: java.util.concurrent.ExecutionException: > org.apache.flink.runtime.rest.util.RestClientException: > [org.apache.flink.runtime.rest.NotFoundException: Job > 865dcd87f4828dbeb3d93eb52e2636b1 not found > at > org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$1(AbstractExecutionGraphHandler.java:99) > at > java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) > at > java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at > java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > at > org.apache.flink.runtime.rest.handler.legacy.DefaultExecutionGraphCache.lambda$getExecutionGraphInternal$0(DefaultExecutionGraphCache.java:109)
[jira] (FLINK-26541) SQL Client should support submitting SQL jobs in application mode
[ https://issues.apache.org/jira/browse/FLINK-26541 ] shizhengchao deleted comment on FLINK-26541: -- was (Author: tinny): Why do not use session mode?If application mode supports sql, then I think there is no different whith session mode. > SQL Client should support submitting SQL jobs in application mode > - > > Key: FLINK-26541 > URL: https://issues.apache.org/jira/browse/FLINK-26541 > Project: Flink > Issue Type: Improvement > Components: Deployment / Kubernetes, Deployment / YARN, Table SQL / > Client >Reporter: Jark Wu >Priority: Major > > Currently, the SQL Client only supports submitting jobs in session mode and > per-job mode. As the community going to drop the per-job mode (FLINK-26000), > SQL Client should support application mode as well. Otherwise, SQL Client can > only submit SQL in session mode then, but streaming jobs should be submitted > in per-job or application mode to have bettter resource isolation. > Disucssions: https://lists.apache.org/thread/2yq351nb721x23rz1q8qlyf2tqrk147r -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [flink] hanyuzheng7 closed pull request #23098: [FLINK-32262] Add MAP_ENTRIES support
hanyuzheng7 closed pull request #23098: [FLINK-32262] Add MAP_ENTRIES support URL: https://github.com/apache/flink/pull/23098 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #23098: [FLINK-32262] Add MAP_ENTRIES support
flinkbot commented on PR #23098: URL: https://github.com/apache/flink/pull/23098#issuecomment-1655074207 ## CI report: * 2a0667be43d20203ef1c1b05119c87c1efbd9030 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org