[jira] [Created] (FLINK-23174) Log improvement in Task throws Error
Bo Cui created FLINK-23174: -- Summary: Log improvement in Task throws Error Key: FLINK-23174 URL: https://issues.apache.org/jira/browse/FLINK-23174 Project: Flink Issue Type: Improvement Reporter: Bo Cui we met some channels close due to network jitter and task fail. we can only see which remote channel causes the task/job failure. but we can not know more details, such as which channel close, task stack... -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23173) Inconsistency detected by ld.so
Xintong Song created FLINK-23173: Summary: Inconsistency detected by ld.so Key: FLINK-23173 URL: https://issues.apache.org/jira/browse/FLINK-23173 Project: Flink Issue Type: Bug Affects Versions: 1.11.3 Reporter: Xintong Song https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19647=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=2b7514ee-e706-5046-657b-3430666e7bd9=7236 The test fails because one of the TMs terminated unexpectedly. The following error message is found in the stdout of the problematic TM. {code} Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23172) Links of restart strategy in configuration page is broken
Zhilong Hong created FLINK-23172: Summary: Links of restart strategy in configuration page is broken Key: FLINK-23172 URL: https://issues.apache.org/jira/browse/FLINK-23172 Project: Flink Issue Type: Technical Debt Components: Documentation Affects Versions: 1.14.0 Reporter: Zhilong Hong Fix For: 1.14.0 The links in Fault Tolerance section of [the configuration page|https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/config/#fault-tolerance/] is broken. Currently the link refers to https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/dev/task_failure_recovery.html#fixed-delay-restart-strategy, which doesn't exist and would head to 404 error. The correct link is https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/state/task_failure_recovery/#fixed-delay-restart-strategy. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[DISCUSS] Better user experience in the WindowAggregate upon Changelog (contains update message)
When WindowAggregate works upon Changelog which contains update messages, UPDATE BEFORE message may be dropped as a late message. [1] In order to handle late UB message, user needs to set *all* the following 3 parameters: (1) enable late fire by setting table.exec.emit.late-fire.enabled : true (2) set per record emit behavior for late records by setting table.exec.emit.late-fire.delay : 0 s (3) keep window state for extra time after window is fired by setting table.exec.emit.allow-lateness : 1 h// 或者table.exec.state.ttl: 1h The solution has two disadvantages: (1) Users may not realize that UB messages may be dropped as a late event, so they will not set related parameters. (2) When users look for a solution to solve the dropped UB messages problem, the current solution is a bit inconvenient for them because they need to set all the 3 parameters. Besides, some configurations have overlap ability. Now there are two proposals to simplify the 3 parameters a little. (1) Users only need set table.exec.emit.allow-lateness (just like the behavior on Datastream, user only need set allow-lateness), framework could atom set `table.exec.emit.late-fire.enabled` to true and set `table.exec.emit.late-fire.delay` to 0s. And in the later version, we deprecate `table.exec.emit.late-fire.delay` and `table.exec.emit.late-fire.enabled`. (2) Users need set `table.exec.emit.late-fire.enabled` to true and set `table.exec.state.ttl`, framework could atom set `table.exec.emit.late-fire.delay` to 0s. And in the later version, we deprecate `table.exec.emit.late-fire.delay` and `table.exec.emit.allow-lateness `. Please let me know what you think about the issue. Thank you. [1] https://issues.apache.org/jira/browse/FLINK-22781 Best regards, JING ZHANG
Re: [VOTE] FLIP-147: Support Checkpoint After Tasks Finished
+1 (binding) Best liujiangang Piotr Nowojski 于2021年6月29日周二 上午2:05写道: > +1 (binding) > > Piotrek > > pon., 28 cze 2021 o 12:48 Dawid Wysakowicz > napisał(a): > > > +1 (binding) > > > > Best, > > > > Dawid > > > > On 28/06/2021 10:45, Yun Gao wrote: > > > Hi all, > > > > > > For FLIP-147[1] which targets at supports checkpoints after tasks > > finished and modify operator > > > API and implementation to ensures the commit of last piece of data, > > since after the last vote > > > we have more discussions[2][3] and a few updates, including changes to > > PublicEvolving API, > > > I'd like to have another VOTE on the current state of the FLIP. > > > > > > The vote will last at least 72 hours (Jul 1st), following the consensus > > > voting process. > > > > > > thanks, > > > Yun > > > > > > > > > [1] https://cwiki.apache.org/confluence/x/mw-ZCQ > > > [2] > > > https://lists.apache.org/thread.html/r400da9898ff66fd613c25efea15de440a86f14758ceeae4950ea25cf%40%3Cdev.flink.apache.org > > > [3] > > > https://lists.apache.org/thread.html/r3953df796ef5ac67d5be9f2251a95ad72efbca31f1d1555d13e71197%40%3Cdev.flink.apache.org%3E > > > > >
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
+1 for the proposal. Since the test time is long and environment may vary, unstable tests are really annoying for developers. The solution is welcome. Best liujiangang Jingsong Li 于2021年6月29日周二 上午10:31写道: > +1 Thanks Xintong for the update! > > Best, > Jingsong > > On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann > wrote: > > > +1, thanks for updating the guidelines Xintong! > > > > Cheers, > > Till > > > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo wrote: > > > > > +1 > > > > > > Thanks Xintong for drafting this doc. > > > > > > Best, > > > Yangze Guo > > > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG > wrote: > > > > > > > > Thanks Xintong for giving detailed documentation. > > > > > > > > The best practice for handling test failure is very detailed, it's a > > good > > > > guidelines document with clear action steps. > > > > > > > > +1 to Xintong's proposal. > > > > > > > > Xintong Song 于2021年6月28日周一 下午4:07写道: > > > > > > > > > Thanks all for the discussion. > > > > > > > > > > Based on the opinions so far, I've drafted the new guidelines [1], > > as a > > > > > potential replacement of the original wiki page [2]. > > > > > > > > > > Hopefully this draft has covered the most opinions discussed and > > > consensus > > > > > made in this discussion thread. > > > > > > > > > > Looking forward to your feedback. > > > > > > > > > > Thank you~ > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing > > > > > > > > > > [2] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > > > > > > > > > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski < > > pnowoj...@apache.org> > > > > > wrote: > > > > > > > > > > > Thanks for the clarification Till. +1 for what you have written. > > > > > > > > > > > > Piotrek > > > > > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann > > > > > napisał(a): > > > > > > > > > > > > > One quick note for clarification. I don't have anything against > > > builds > > > > > > > running on your personal Azure account and this is not what I > > > > > understood > > > > > > > under "local environment". For me "local environment" means > that > > > > > someone > > > > > > > runs the test locally on his machine and then says that the > > > > > > > tests have passed locally. > > > > > > > > > > > > > > I do agree that there might be a conflict of interests if a PR > > > author > > > > > > > disables tests. Here I would argue that we don't have malignant > > > > > > committers > > > > > > > which means that every committer will probably first check the > > > > > respective > > > > > > > ticket for how often the test failed. Then I guess the next > step > > > would > > > > > be > > > > > > > to discuss on the ticket whether to disable it or not. And > > finally, > > > > > after > > > > > > > reaching a consensus, it will be disabled. If we see someone > > > abusing > > > > > this > > > > > > > policy, then we can still think about how to guard against it. > > But, > > > > > > > honestly, I have very rarely seen such a case. I am also ok to > > > pull in > > > > > > the > > > > > > > release manager to make the final call if this resolves > concerns. > > > > > > > > > > > > > > Cheers, > > > > > > > Till > > > > > > > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski < > > > pnowoj...@apache.org> > > > > > > > wrote: > > > > > > > > > > > > > > > +1 for the general idea, however I have concerns about a > couple > > > of > > > > > > > details. > > > > > > > > > > > > > > > > > I would first try to not introduce the exception for local > > > builds. > > > > > > > > > It makes it quite hard for others to verify the build and > to > > > make > > > > > > sure > > > > > > > > that the right things were executed. > > > > > > > > > > > > > > > > I would counter Till's proposal to ignore local green builds. > > If > > > > > > > committer > > > > > > > > is merging and closing a PR, with official azure failure, but > > > there > > > > > > was a > > > > > > > > green build before or in local azure it's IMO enough to leave > > the > > > > > > > message: > > > > > > > > > > > > > > > > > Latest build failure is a known issue: FLINK-12345 > > > > > > > > > Green local build: URL > > > > > > > > > > > > > > > > This should address Till's concern about verification. > > > > > > > > > > > > > > > > On the other hand I have concerns about disabling tests.* It > > > > > shouldn't > > > > > > be > > > > > > > > the PR author/committer that's disabling a test on his own, > as > > > > > that's a > > > > > > > > conflict of interests*. I have however no problems with > > disabling > > > > > test > > > > > > > > instabilities that were marked as "blockers" though, that > > should > > > work > > > > > > > > pretty well. But the important thing here is to correctly > judge > > > > > bumping > > > > > > > >
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
+1 Thanks Xintong for the update! Best, Jingsong On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann wrote: > +1, thanks for updating the guidelines Xintong! > > Cheers, > Till > > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo wrote: > > > +1 > > > > Thanks Xintong for drafting this doc. > > > > Best, > > Yangze Guo > > > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG wrote: > > > > > > Thanks Xintong for giving detailed documentation. > > > > > > The best practice for handling test failure is very detailed, it's a > good > > > guidelines document with clear action steps. > > > > > > +1 to Xintong's proposal. > > > > > > Xintong Song 于2021年6月28日周一 下午4:07写道: > > > > > > > Thanks all for the discussion. > > > > > > > > Based on the opinions so far, I've drafted the new guidelines [1], > as a > > > > potential replacement of the original wiki page [2]. > > > > > > > > Hopefully this draft has covered the most opinions discussed and > > consensus > > > > made in this discussion thread. > > > > > > > > Looking forward to your feedback. > > > > > > > > Thank you~ > > > > > > > > Xintong Song > > > > > > > > > > > > [1] > > > > > > > > > > > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing > > > > > > > > [2] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > > > > > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski < > pnowoj...@apache.org> > > > > wrote: > > > > > > > > > Thanks for the clarification Till. +1 for what you have written. > > > > > > > > > > Piotrek > > > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann > > > > napisał(a): > > > > > > > > > > > One quick note for clarification. I don't have anything against > > builds > > > > > > running on your personal Azure account and this is not what I > > > > understood > > > > > > under "local environment". For me "local environment" means that > > > > someone > > > > > > runs the test locally on his machine and then says that the > > > > > > tests have passed locally. > > > > > > > > > > > > I do agree that there might be a conflict of interests if a PR > > author > > > > > > disables tests. Here I would argue that we don't have malignant > > > > > committers > > > > > > which means that every committer will probably first check the > > > > respective > > > > > > ticket for how often the test failed. Then I guess the next step > > would > > > > be > > > > > > to discuss on the ticket whether to disable it or not. And > finally, > > > > after > > > > > > reaching a consensus, it will be disabled. If we see someone > > abusing > > > > this > > > > > > policy, then we can still think about how to guard against it. > But, > > > > > > honestly, I have very rarely seen such a case. I am also ok to > > pull in > > > > > the > > > > > > release manager to make the final call if this resolves concerns. > > > > > > > > > > > > Cheers, > > > > > > Till > > > > > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski < > > pnowoj...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > +1 for the general idea, however I have concerns about a couple > > of > > > > > > details. > > > > > > > > > > > > > > > I would first try to not introduce the exception for local > > builds. > > > > > > > > It makes it quite hard for others to verify the build and to > > make > > > > > sure > > > > > > > that the right things were executed. > > > > > > > > > > > > > > I would counter Till's proposal to ignore local green builds. > If > > > > > > committer > > > > > > > is merging and closing a PR, with official azure failure, but > > there > > > > > was a > > > > > > > green build before or in local azure it's IMO enough to leave > the > > > > > > message: > > > > > > > > > > > > > > > Latest build failure is a known issue: FLINK-12345 > > > > > > > > Green local build: URL > > > > > > > > > > > > > > This should address Till's concern about verification. > > > > > > > > > > > > > > On the other hand I have concerns about disabling tests.* It > > > > shouldn't > > > > > be > > > > > > > the PR author/committer that's disabling a test on his own, as > > > > that's a > > > > > > > conflict of interests*. I have however no problems with > disabling > > > > test > > > > > > > instabilities that were marked as "blockers" though, that > should > > work > > > > > > > pretty well. But the important thing here is to correctly judge > > > > bumping > > > > > > > priorities of test instabilities based on their frequency and > > current > > > > > > > general health of the system. I believe that release managers > > should > > > > be > > > > > > > playing a big role here in deciding on the guidelines of what > > should > > > > > be a > > > > > > > priority of certain test instabilities. > > > > > > > > > > > > > > What I mean by that is two example scenarios: > > > > > > > 1. if we have a handful of very frequently failing tests and a > > > > handful > > > > > of > > > > > > > very
Re: [VOTE] FLIP-172: Support custom transactional.id prefix in FlinkKafkaProducer
+1 (binding) On Mon, Jun 28, 2021 at 8:04 PM Piotr Nowojski wrote: > +1 (binding) > > Piotrek > > pon., 28 cze 2021 o 16:01 Wenhao Ji napisał(a): > > > Hi everyone, > > > > I would like to start a vote on FLIP-172 [1] which was discussed in > > this thread [2]. > > The vote will be open for at least 72 hours until July 1 unless there > > is an objection or not enough votes. > > > > Thanks, > > Wenhao > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-172%3A+Support+custom+transactional.id+prefix+in+FlinkKafkaProducer > > [2] > > > https://lists.apache.org/thread.html/r67610aa2d4dfdaf3b027b82edd1a3f46771f0d58902a4258d931e5a5%40%3Cdev.flink.apache.org%3E > > >
Re: [VOTE] FLIP-147: Support Checkpoint After Tasks Finished
+1 (binding) Piotrek pon., 28 cze 2021 o 12:48 Dawid Wysakowicz napisał(a): > +1 (binding) > > Best, > > Dawid > > On 28/06/2021 10:45, Yun Gao wrote: > > Hi all, > > > > For FLIP-147[1] which targets at supports checkpoints after tasks > finished and modify operator > > API and implementation to ensures the commit of last piece of data, > since after the last vote > > we have more discussions[2][3] and a few updates, including changes to > PublicEvolving API, > > I'd like to have another VOTE on the current state of the FLIP. > > > > The vote will last at least 72 hours (Jul 1st), following the consensus > > voting process. > > > > thanks, > > Yun > > > > > > [1] https://cwiki.apache.org/confluence/x/mw-ZCQ > > [2] > https://lists.apache.org/thread.html/r400da9898ff66fd613c25efea15de440a86f14758ceeae4950ea25cf%40%3Cdev.flink.apache.org > > [3] > https://lists.apache.org/thread.html/r3953df796ef5ac67d5be9f2251a95ad72efbca31f1d1555d13e71197%40%3Cdev.flink.apache.org%3E > >
Re: [VOTE] FLIP-172: Support custom transactional.id prefix in FlinkKafkaProducer
+1 (binding) Piotrek pon., 28 cze 2021 o 16:01 Wenhao Ji napisał(a): > Hi everyone, > > I would like to start a vote on FLIP-172 [1] which was discussed in > this thread [2]. > The vote will be open for at least 72 hours until July 1 unless there > is an objection or not enough votes. > > Thanks, > Wenhao > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-172%3A+Support+custom+transactional.id+prefix+in+FlinkKafkaProducer > [2] > https://lists.apache.org/thread.html/r67610aa2d4dfdaf3b027b82edd1a3f46771f0d58902a4258d931e5a5%40%3Cdev.flink.apache.org%3E >
[jira] [Created] (FLINK-23171) Can't execute SET table.sql-dialect=hive;
JasonLee created FLINK-23171: Summary: Can't execute SET table.sql-dialect=hive; Key: FLINK-23171 URL: https://issues.apache.org/jira/browse/FLINK-23171 Project: Flink Issue Type: Bug Components: Connectors / Hive Affects Versions: 1.13.1 Environment: Flink 1.13.1 hive 2.3.4 Reporter: JasonLee Fix For: 1.14.0 sql client throw an exception when I switch dialects like this SET table.sql-dialect=hive; The exception is as follows: {code:java} // code placeholder Exception in thread "main" org.apache.flink.table.client.SqlClientException: Unexpected exception. This is a bug. Please consider filing an issue.Exception in thread "main" org.apache.flink.table.client.SqlClientException: Unexpected exception. This is a bug. Please consider filing an issue. at org.apache.flink.table.client.SqlClient.startClient(SqlClient.java:201) at org.apache.flink.table.client.SqlClient.main(SqlClient.java:161)Caused by: java.lang.BootstrapMethodError: java.lang.NoSuchMethodError: org.apache.flink.table.planner.delegation.PlannerContext.createSqlExprToRexConverter(Lorg/apache/calcite/rel/type/RelDataType;)Lorg/apache/flink/table/planner/calcite/SqlExprToRexConverter; at org.apache.flink.table.planner.delegation.hive.HiveParserFactory.create(HiveParserFactory.java:39) at org.apache.flink.table.planner.delegation.PlannerBase.createNewParser(PlannerBase.scala:144) at org.apache.flink.table.planner.delegation.PlannerBase.getParser(PlannerBase.scala:149) at org.apache.flink.table.api.internal.TableEnvironmentImpl.getParser(TableEnvironmentImpl.java:1466) at org.apache.flink.table.api.internal.TableEnvironmentImpl.(TableEnvironmentImpl.java:237) at org.apache.flink.table.api.bridge.java.internal.StreamTableEnvironmentImpl.(StreamTableEnvironmentImpl.java:113) at org.apache.flink.table.client.gateway.context.ExecutionContext.createStreamTableEnvironment(ExecutionContext.java:156) at org.apache.flink.table.client.gateway.context.ExecutionContext.createTableEnvironment(ExecutionContext.java:116) at org.apache.flink.table.client.gateway.context.ExecutionContext.(ExecutionContext.java:82) at org.apache.flink.table.client.gateway.context.SessionContext.set(SessionContext.java:156) at org.apache.flink.table.client.gateway.local.LocalExecutor.setSessionProperty(LocalExecutor.java:164) at org.apache.flink.table.client.cli.CliClient.callSet(CliClient.java:456) at org.apache.flink.table.client.cli.CliClient.callOperation(CliClient.java:403) at org.apache.flink.table.client.cli.CliClient.lambda$executeStatement$0(CliClient.java:327) at java.util.Optional.ifPresent(Optional.java:159) at org.apache.flink.table.client.cli.CliClient.executeStatement(CliClient.java:327) at org.apache.flink.table.client.cli.CliClient.executeInteractive(CliClient.java:297) at org.apache.flink.table.client.cli.CliClient.executeInInteractiveMode(CliClient.java:221) at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:151) at org.apache.flink.table.client.SqlClient.start(SqlClient.java:95) at org.apache.flink.table.client.SqlClient.startClient(SqlClient.java:187) ... 1 moreCaused by: java.lang.NoSuchMethodError: org.apache.flink.table.planner.delegation.PlannerContext.createSqlExprToRexConverter(Lorg/apache/calcite/rel/type/RelDataType;)Lorg/apache/flink/table/planner/calcite/SqlExprToRexConverter; at java.lang.invoke.MethodHandleNatives.resolve(Native Method) at java.lang.invoke.MemberName$Factory.resolve(MemberName.java:975) at java.lang.invoke.MemberName$Factory.resolveOrFail(MemberName.java:1000) at java.lang.invoke.MethodHandles$Lookup.resolveOrFail(MethodHandles.java:1389) at java.lang.invoke.MethodHandles$Lookup.linkMethodHandleConstant(MethodHandles.java:1745) at java.lang.invoke.MethodHandleNatives.linkMethodHandleConstant(MethodHandleNatives.java:477) ... 22 more {code} I guess there's a packet conflict, But I can execute it in version 1.13.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[VOTE] FLIP-172: Support custom transactional.id prefix in FlinkKafkaProducer
Hi everyone, I would like to start a vote on FLIP-172 [1] which was discussed in this thread [2]. The vote will be open for at least 72 hours until July 1 unless there is an objection or not enough votes. Thanks, Wenhao [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-172%3A+Support+custom+transactional.id+prefix+in+FlinkKafkaProducer [2] https://lists.apache.org/thread.html/r67610aa2d4dfdaf3b027b82edd1a3f46771f0d58902a4258d931e5a5%40%3Cdev.flink.apache.org%3E
[jira] [Created] (FLINK-23170) Write metadata after materialization
Roman Khachatryan created FLINK-23170: - Summary: Write metadata after materialization Key: FLINK-23170 URL: https://issues.apache.org/jira/browse/FLINK-23170 Project: Flink Issue Type: Sub-task Components: Runtime / State Backends Reporter: Roman Khachatryan Fix For: 1.14.0 Currently, changelog state backend writes state metadata on first state access. It is written to the changelog On materialization, the changelog can be truncated, so the metadata needs to be written again. This can be achieved by resetting AbstractStateChangeLogger.metaDataWritten flag. It can be further optimized by storing the SQN at which the metadata was written and only resetting the flag if materializedSqn >= metadataSqn; but materialization is relatively rare so it probably doesn't worth it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23169) Support user-level app staging directory when yarn.staging-directory is specified
jinfeng created FLINK-23169: --- Summary: Support user-level app staging directory when yarn.staging-directory is specified Key: FLINK-23169 URL: https://issues.apache.org/jira/browse/FLINK-23169 Project: Flink Issue Type: Improvement Components: Deployment / YARN Reporter: jinfeng When yarn.staging-directory is specified, different users will use the same directory as the staging directory. It may not friendly for a job platform to submit job for different users. I propose to use the user-level directory by default when yarn.staging-directory is specified. We only need to make small changes for `getStagingDir` function in YarnClusterDescriptor -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] FLIP-147: Support Checkpoint After Tasks Finished
+1 (binding) Best, Dawid On 28/06/2021 10:45, Yun Gao wrote: > Hi all, > > For FLIP-147[1] which targets at supports checkpoints after tasks finished > and modify operator > API and implementation to ensures the commit of last piece of data, since > after the last vote > we have more discussions[2][3] and a few updates, including changes to > PublicEvolving API, > I'd like to have another VOTE on the current state of the FLIP. > > The vote will last at least 72 hours (Jul 1st), following the consensus > voting process. > > thanks, > Yun > > > [1] https://cwiki.apache.org/confluence/x/mw-ZCQ > [2] > https://lists.apache.org/thread.html/r400da9898ff66fd613c25efea15de440a86f14758ceeae4950ea25cf%40%3Cdev.flink.apache.org > [3] > https://lists.apache.org/thread.html/r3953df796ef5ac67d5be9f2251a95ad72efbca31f1d1555d13e71197%40%3Cdev.flink.apache.org%3E OpenPGP_signature Description: OpenPGP digital signature
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
+1, thanks for updating the guidelines Xintong! Cheers, Till On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo wrote: > +1 > > Thanks Xintong for drafting this doc. > > Best, > Yangze Guo > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG wrote: > > > > Thanks Xintong for giving detailed documentation. > > > > The best practice for handling test failure is very detailed, it's a good > > guidelines document with clear action steps. > > > > +1 to Xintong's proposal. > > > > Xintong Song 于2021年6月28日周一 下午4:07写道: > > > > > Thanks all for the discussion. > > > > > > Based on the opinions so far, I've drafted the new guidelines [1], as a > > > potential replacement of the original wiki page [2]. > > > > > > Hopefully this draft has covered the most opinions discussed and > consensus > > > made in this discussion thread. > > > > > > Looking forward to your feedback. > > > > > > Thank you~ > > > > > > Xintong Song > > > > > > > > > [1] > > > > > > > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing > > > > > > [2] > > > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski > > > wrote: > > > > > > > Thanks for the clarification Till. +1 for what you have written. > > > > > > > > Piotrek > > > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann > > > napisał(a): > > > > > > > > > One quick note for clarification. I don't have anything against > builds > > > > > running on your personal Azure account and this is not what I > > > understood > > > > > under "local environment". For me "local environment" means that > > > someone > > > > > runs the test locally on his machine and then says that the > > > > > tests have passed locally. > > > > > > > > > > I do agree that there might be a conflict of interests if a PR > author > > > > > disables tests. Here I would argue that we don't have malignant > > > > committers > > > > > which means that every committer will probably first check the > > > respective > > > > > ticket for how often the test failed. Then I guess the next step > would > > > be > > > > > to discuss on the ticket whether to disable it or not. And finally, > > > after > > > > > reaching a consensus, it will be disabled. If we see someone > abusing > > > this > > > > > policy, then we can still think about how to guard against it. But, > > > > > honestly, I have very rarely seen such a case. I am also ok to > pull in > > > > the > > > > > release manager to make the final call if this resolves concerns. > > > > > > > > > > Cheers, > > > > > Till > > > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski < > pnowoj...@apache.org> > > > > > wrote: > > > > > > > > > > > +1 for the general idea, however I have concerns about a couple > of > > > > > details. > > > > > > > > > > > > > I would first try to not introduce the exception for local > builds. > > > > > > > It makes it quite hard for others to verify the build and to > make > > > > sure > > > > > > that the right things were executed. > > > > > > > > > > > > I would counter Till's proposal to ignore local green builds. If > > > > > committer > > > > > > is merging and closing a PR, with official azure failure, but > there > > > > was a > > > > > > green build before or in local azure it's IMO enough to leave the > > > > > message: > > > > > > > > > > > > > Latest build failure is a known issue: FLINK-12345 > > > > > > > Green local build: URL > > > > > > > > > > > > This should address Till's concern about verification. > > > > > > > > > > > > On the other hand I have concerns about disabling tests.* It > > > shouldn't > > > > be > > > > > > the PR author/committer that's disabling a test on his own, as > > > that's a > > > > > > conflict of interests*. I have however no problems with disabling > > > test > > > > > > instabilities that were marked as "blockers" though, that should > work > > > > > > pretty well. But the important thing here is to correctly judge > > > bumping > > > > > > priorities of test instabilities based on their frequency and > current > > > > > > general health of the system. I believe that release managers > should > > > be > > > > > > playing a big role here in deciding on the guidelines of what > should > > > > be a > > > > > > priority of certain test instabilities. > > > > > > > > > > > > What I mean by that is two example scenarios: > > > > > > 1. if we have a handful of very frequently failing tests and a > > > handful > > > > of > > > > > > very rarely failing tests (like one reported failure and no > another > > > > > > occurrence in many months, and let's even say that the failure > looks > > > > like > > > > > > infrastructure/network timeout), we should focus on the > frequently > > > > > failing > > > > > > ones, and probably we are safe to ignore for the time being the > rare > > > > > issues > > > > > > - at least until we deal with the most pressing ones. > > > > > > 2. If
Re: Flink 1.14. Bi-weekly 2021-06-22
Thanks a lot for the update, Joe. This is very helpful! Cheers, Till On Mon, Jun 28, 2021 at 10:10 AM Xintong Song wrote: > Thanks for the update, Joe. > > Thank you~ > > Xintong Song > > > On Mon, Jun 28, 2021 at 3:54 PM Johannes Moser > wrote: > > > Hello, > > > > Last Tuesday was our second bi-weekly. > > > > You can read up the outcome in the confluence wiki page [1]. > > > > *Feature freeze date* > > As we didn't come to a clear agreement, we will keep the anticipated > > feature freeze date > > as it is at early August. > > > > *Build stability* > > The good thing: we decreased the number of issues, the not so good thing: > > only by ten. > > We as a community need to put further effort into this. > > > > *Dependencies* > > We'd like to ask all contributors to have a look at the components they > > are heavily > > Involved with to see if any dependencies require updating. There were > some > > Issues recently to pass the security scans by some of the users. In > future > > this should > > somehow be a default at the beginning of every release cycle. > > > > *Criteria for merging PRs* > > We want to avoid merging PRs with unrelated CI failures. We are quite > > aware that we > > need to raise the importance of the Docker caching issue. > > > > What can you do to make the Flink 1.14. release a good one: > > * Identify and update outdated dependencies > > * Get rid of test instabilities > > * Don't merge PRs including unrelated CI failures > > > > Best, > > Joe > > > > > > [1] https://cwiki.apache.org/confluence/display/FLINK/1.14+Release >
Re: [DISCUSS] FLIP-172: Support custom transactional.id prefix in FlinkKafkaProducer
Sounds good from my side, please go ahead. On Fri, Jun 25, 2021 at 5:31 PM Wenhao Ji wrote: > Thanks Stephan and Piotr for your replies. It seems that there is no > problem or concern about this feature. If there is no further > objection, I will start a vote thread for FLIP-172. > > Thanks, > Wenhao > > On Wed, Jun 23, 2021 at 3:41 PM Piotr Nowojski > wrote: > > > > Hi, > > > > +1 from my side on this idea. I do not see any problems that could be > > caused by this change. > > > > Best, > > Piotrek > > > > śr., 23 cze 2021 o 08:59 Stephan Ewen napisał(a): > > > > > The motivation and the proposal sound good to me, +1 from my side. > > > > > > Would be good to have a quick opinion from someone who worked > specifically > > > with Kafka, maybe Becket or Piotr? > > > > > > Best, > > > Stephan > > > > > > > > > On Sat, Jun 12, 2021 at 9:50 AM Wenhao Ji > wrote: > > > > > >> Hi everyone, > > >> > > >> I would like to open this discussion thread to take about the FLIP-172 > > >> < > > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-172%3A+Support+custom+transactional.id+prefix+in+FlinkKafkaProducer > > >> >, > > >> which aims to provide a way to support specifying a custom > > >> transactional.id > > >> in the FlinkKafkaProducer class. > > >> > > >> I am looking forwards to your feedback and suggestions! > > >> > > >> Thanks, > > >> Wenhao > > >> > > > >
[jira] [Created] (FLINK-23168) Catalog shouldn't merge properties for alter DB operation
Rui Li created FLINK-23168: -- Summary: Catalog shouldn't merge properties for alter DB operation Key: FLINK-23168 URL: https://issues.apache.org/jira/browse/FLINK-23168 Project: Flink Issue Type: Improvement Components: Connectors / Hive, Table SQL / API Reporter: Rui Li -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
+1 Thanks Xintong for drafting this doc. Best, Yangze Guo On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG wrote: > > Thanks Xintong for giving detailed documentation. > > The best practice for handling test failure is very detailed, it's a good > guidelines document with clear action steps. > > +1 to Xintong's proposal. > > Xintong Song 于2021年6月28日周一 下午4:07写道: > > > Thanks all for the discussion. > > > > Based on the opinions so far, I've drafted the new guidelines [1], as a > > potential replacement of the original wiki page [2]. > > > > Hopefully this draft has covered the most opinions discussed and consensus > > made in this discussion thread. > > > > Looking forward to your feedback. > > > > Thank you~ > > > > Xintong Song > > > > > > [1] > > > > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing > > > > [2] > > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski > > wrote: > > > > > Thanks for the clarification Till. +1 for what you have written. > > > > > > Piotrek > > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann > > napisał(a): > > > > > > > One quick note for clarification. I don't have anything against builds > > > > running on your personal Azure account and this is not what I > > understood > > > > under "local environment". For me "local environment" means that > > someone > > > > runs the test locally on his machine and then says that the > > > > tests have passed locally. > > > > > > > > I do agree that there might be a conflict of interests if a PR author > > > > disables tests. Here I would argue that we don't have malignant > > > committers > > > > which means that every committer will probably first check the > > respective > > > > ticket for how often the test failed. Then I guess the next step would > > be > > > > to discuss on the ticket whether to disable it or not. And finally, > > after > > > > reaching a consensus, it will be disabled. If we see someone abusing > > this > > > > policy, then we can still think about how to guard against it. But, > > > > honestly, I have very rarely seen such a case. I am also ok to pull in > > > the > > > > release manager to make the final call if this resolves concerns. > > > > > > > > Cheers, > > > > Till > > > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski > > > > wrote: > > > > > > > > > +1 for the general idea, however I have concerns about a couple of > > > > details. > > > > > > > > > > > I would first try to not introduce the exception for local builds. > > > > > > It makes it quite hard for others to verify the build and to make > > > sure > > > > > that the right things were executed. > > > > > > > > > > I would counter Till's proposal to ignore local green builds. If > > > > committer > > > > > is merging and closing a PR, with official azure failure, but there > > > was a > > > > > green build before or in local azure it's IMO enough to leave the > > > > message: > > > > > > > > > > > Latest build failure is a known issue: FLINK-12345 > > > > > > Green local build: URL > > > > > > > > > > This should address Till's concern about verification. > > > > > > > > > > On the other hand I have concerns about disabling tests.* It > > shouldn't > > > be > > > > > the PR author/committer that's disabling a test on his own, as > > that's a > > > > > conflict of interests*. I have however no problems with disabling > > test > > > > > instabilities that were marked as "blockers" though, that should work > > > > > pretty well. But the important thing here is to correctly judge > > bumping > > > > > priorities of test instabilities based on their frequency and current > > > > > general health of the system. I believe that release managers should > > be > > > > > playing a big role here in deciding on the guidelines of what should > > > be a > > > > > priority of certain test instabilities. > > > > > > > > > > What I mean by that is two example scenarios: > > > > > 1. if we have a handful of very frequently failing tests and a > > handful > > > of > > > > > very rarely failing tests (like one reported failure and no another > > > > > occurrence in many months, and let's even say that the failure looks > > > like > > > > > infrastructure/network timeout), we should focus on the frequently > > > > failing > > > > > ones, and probably we are safe to ignore for the time being the rare > > > > issues > > > > > - at least until we deal with the most pressing ones. > > > > > 2. If we have tons of rarely failing test instabilities, we should > > > > probably > > > > > start addressing them as blockers. > > > > > > > > > > I'm using my own conscious and my best judgement when I'm > > > > > bumping/decreasing priorities of test instabilities (and bugs), but > > as > > > > > individual committer I don't have the full picture. As I wrote > > above, I > > > > > think release managers are in a much better position to keep
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
Thanks Xintong for giving detailed documentation. The best practice for handling test failure is very detailed, it's a good guidelines document with clear action steps. +1 to Xintong's proposal. Xintong Song 于2021年6月28日周一 下午4:07写道: > Thanks all for the discussion. > > Based on the opinions so far, I've drafted the new guidelines [1], as a > potential replacement of the original wiki page [2]. > > Hopefully this draft has covered the most opinions discussed and consensus > made in this discussion thread. > > Looking forward to your feedback. > > Thank you~ > > Xintong Song > > > [1] > > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing > > [2] > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski > wrote: > > > Thanks for the clarification Till. +1 for what you have written. > > > > Piotrek > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann > napisał(a): > > > > > One quick note for clarification. I don't have anything against builds > > > running on your personal Azure account and this is not what I > understood > > > under "local environment". For me "local environment" means that > someone > > > runs the test locally on his machine and then says that the > > > tests have passed locally. > > > > > > I do agree that there might be a conflict of interests if a PR author > > > disables tests. Here I would argue that we don't have malignant > > committers > > > which means that every committer will probably first check the > respective > > > ticket for how often the test failed. Then I guess the next step would > be > > > to discuss on the ticket whether to disable it or not. And finally, > after > > > reaching a consensus, it will be disabled. If we see someone abusing > this > > > policy, then we can still think about how to guard against it. But, > > > honestly, I have very rarely seen such a case. I am also ok to pull in > > the > > > release manager to make the final call if this resolves concerns. > > > > > > Cheers, > > > Till > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski > > > wrote: > > > > > > > +1 for the general idea, however I have concerns about a couple of > > > details. > > > > > > > > > I would first try to not introduce the exception for local builds. > > > > > It makes it quite hard for others to verify the build and to make > > sure > > > > that the right things were executed. > > > > > > > > I would counter Till's proposal to ignore local green builds. If > > > committer > > > > is merging and closing a PR, with official azure failure, but there > > was a > > > > green build before or in local azure it's IMO enough to leave the > > > message: > > > > > > > > > Latest build failure is a known issue: FLINK-12345 > > > > > Green local build: URL > > > > > > > > This should address Till's concern about verification. > > > > > > > > On the other hand I have concerns about disabling tests.* It > shouldn't > > be > > > > the PR author/committer that's disabling a test on his own, as > that's a > > > > conflict of interests*. I have however no problems with disabling > test > > > > instabilities that were marked as "blockers" though, that should work > > > > pretty well. But the important thing here is to correctly judge > bumping > > > > priorities of test instabilities based on their frequency and current > > > > general health of the system. I believe that release managers should > be > > > > playing a big role here in deciding on the guidelines of what should > > be a > > > > priority of certain test instabilities. > > > > > > > > What I mean by that is two example scenarios: > > > > 1. if we have a handful of very frequently failing tests and a > handful > > of > > > > very rarely failing tests (like one reported failure and no another > > > > occurrence in many months, and let's even say that the failure looks > > like > > > > infrastructure/network timeout), we should focus on the frequently > > > failing > > > > ones, and probably we are safe to ignore for the time being the rare > > > issues > > > > - at least until we deal with the most pressing ones. > > > > 2. If we have tons of rarely failing test instabilities, we should > > > probably > > > > start addressing them as blockers. > > > > > > > > I'm using my own conscious and my best judgement when I'm > > > > bumping/decreasing priorities of test instabilities (and bugs), but > as > > > > individual committer I don't have the full picture. As I wrote > above, I > > > > think release managers are in a much better position to keep > adjusting > > > > those kind of guidelines. > > > > > > > > Best, Piotrek > > > > > > > > pt., 25 cze 2021 o 08:10 Yu Li napisał(a): > > > > > > > > > +1 for Xintong's proposal. > > > > > > > > > > For me, resolving problems directly (fixing the infrastructure > issue, > > > > > disabling unstable tests and creating blocker JIRAs to track the > fix > > > and > > > > >
[jira] [Created] (FLINK-23167) Port Kinesis Table API e2e tests to release-1.12 branch
Emre Kartoglu created FLINK-23167: - Summary: Port Kinesis Table API e2e tests to release-1.12 branch Key: FLINK-23167 URL: https://issues.apache.org/jira/browse/FLINK-23167 Project: Flink Issue Type: Improvement Components: Connectors / Kinesis Affects Versions: 1.12.6 Reporter: Emre Kartoglu https://issues.apache.org/jira/browse/FLINK-20042 added e2e tests for the Kinesis Table API. This was only done for versions >=1.13 however. We need to port these tests to the release-1.12 branch as version 1.12 supports the same functionality that needs the same (or a similar) test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[VOTE] FLIP-147: Support Checkpoint After Tasks Finished
Hi all, For FLIP-147[1] which targets at supports checkpoints after tasks finished and modify operator API and implementation to ensures the commit of last piece of data, since after the last vote we have more discussions[2][3] and a few updates, including changes to PublicEvolving API, I'd like to have another VOTE on the current state of the FLIP. The vote will last at least 72 hours (Jul 1st), following the consensus voting process. thanks, Yun [1] https://cwiki.apache.org/confluence/x/mw-ZCQ [2] https://lists.apache.org/thread.html/r400da9898ff66fd613c25efea15de440a86f14758ceeae4950ea25cf%40%3Cdev.flink.apache.org [3] https://lists.apache.org/thread.html/r3953df796ef5ac67d5be9f2251a95ad72efbca31f1d1555d13e71197%40%3Cdev.flink.apache.org%3E
Re: Flink 1.14. Bi-weekly 2021-06-22
Thanks for the update, Joe. Thank you~ Xintong Song On Mon, Jun 28, 2021 at 3:54 PM Johannes Moser wrote: > Hello, > > Last Tuesday was our second bi-weekly. > > You can read up the outcome in the confluence wiki page [1]. > > *Feature freeze date* > As we didn't come to a clear agreement, we will keep the anticipated > feature freeze date > as it is at early August. > > *Build stability* > The good thing: we decreased the number of issues, the not so good thing: > only by ten. > We as a community need to put further effort into this. > > *Dependencies* > We'd like to ask all contributors to have a look at the components they > are heavily > Involved with to see if any dependencies require updating. There were some > Issues recently to pass the security scans by some of the users. In future > this should > somehow be a default at the beginning of every release cycle. > > *Criteria for merging PRs* > We want to avoid merging PRs with unrelated CI failures. We are quite > aware that we > need to raise the importance of the Docker caching issue. > > What can you do to make the Flink 1.14. release a good one: > * Identify and update outdated dependencies > * Get rid of test instabilities > * Don't merge PRs including unrelated CI failures > > Best, > Joe > > > [1] https://cwiki.apache.org/confluence/display/FLINK/1.14+Release
Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.
Thanks all for the discussion. Based on the opinions so far, I've drafted the new guidelines [1], as a potential replacement of the original wiki page [2]. Hopefully this draft has covered the most opinions discussed and consensus made in this discussion thread. Looking forward to your feedback. Thank you~ Xintong Song [1] https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing [2] https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski wrote: > Thanks for the clarification Till. +1 for what you have written. > > Piotrek > > pt., 25 cze 2021 o 16:00 Till Rohrmann napisał(a): > > > One quick note for clarification. I don't have anything against builds > > running on your personal Azure account and this is not what I understood > > under "local environment". For me "local environment" means that someone > > runs the test locally on his machine and then says that the > > tests have passed locally. > > > > I do agree that there might be a conflict of interests if a PR author > > disables tests. Here I would argue that we don't have malignant > committers > > which means that every committer will probably first check the respective > > ticket for how often the test failed. Then I guess the next step would be > > to discuss on the ticket whether to disable it or not. And finally, after > > reaching a consensus, it will be disabled. If we see someone abusing this > > policy, then we can still think about how to guard against it. But, > > honestly, I have very rarely seen such a case. I am also ok to pull in > the > > release manager to make the final call if this resolves concerns. > > > > Cheers, > > Till > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski > > wrote: > > > > > +1 for the general idea, however I have concerns about a couple of > > details. > > > > > > > I would first try to not introduce the exception for local builds. > > > > It makes it quite hard for others to verify the build and to make > sure > > > that the right things were executed. > > > > > > I would counter Till's proposal to ignore local green builds. If > > committer > > > is merging and closing a PR, with official azure failure, but there > was a > > > green build before or in local azure it's IMO enough to leave the > > message: > > > > > > > Latest build failure is a known issue: FLINK-12345 > > > > Green local build: URL > > > > > > This should address Till's concern about verification. > > > > > > On the other hand I have concerns about disabling tests.* It shouldn't > be > > > the PR author/committer that's disabling a test on his own, as that's a > > > conflict of interests*. I have however no problems with disabling test > > > instabilities that were marked as "blockers" though, that should work > > > pretty well. But the important thing here is to correctly judge bumping > > > priorities of test instabilities based on their frequency and current > > > general health of the system. I believe that release managers should be > > > playing a big role here in deciding on the guidelines of what should > be a > > > priority of certain test instabilities. > > > > > > What I mean by that is two example scenarios: > > > 1. if we have a handful of very frequently failing tests and a handful > of > > > very rarely failing tests (like one reported failure and no another > > > occurrence in many months, and let's even say that the failure looks > like > > > infrastructure/network timeout), we should focus on the frequently > > failing > > > ones, and probably we are safe to ignore for the time being the rare > > issues > > > - at least until we deal with the most pressing ones. > > > 2. If we have tons of rarely failing test instabilities, we should > > probably > > > start addressing them as blockers. > > > > > > I'm using my own conscious and my best judgement when I'm > > > bumping/decreasing priorities of test instabilities (and bugs), but as > > > individual committer I don't have the full picture. As I wrote above, I > > > think release managers are in a much better position to keep adjusting > > > those kind of guidelines. > > > > > > Best, Piotrek > > > > > > pt., 25 cze 2021 o 08:10 Yu Li napisał(a): > > > > > > > +1 for Xintong's proposal. > > > > > > > > For me, resolving problems directly (fixing the infrastructure issue, > > > > disabling unstable tests and creating blocker JIRAs to track the fix > > and > > > > re-enable them asap, etc.) is (in most cases) better than working > > around > > > > them (verify locally, manually check and judge the failure as > > > "unrelated", > > > > etc.), and I believe the proposal could help us pushing those more > > "real" > > > > solutions forward. > > > > > > > > Best Regards, > > > > Yu > > > > > > > > > > > > On Fri, 25 Jun 2021 at 10:58, Yangze Guo wrote: > > > > > > > > > Creating a blocker issue for the manually disabled tests sounds > good > > to > > > > me.
Re: [Discuss] Planning Flink 1.14
Hi all, We discussed the matter again in our latest release planning (see [1]). We see a lot of valid points in this thread. As we were not able to come to a clear conclusion within the meeting and most of the arguments mentioned will still be valid even if we extend the feature freeze by a month. We are keeping this for now at early August. I will collect all the inputs and talk to some users to further improve the experience also for those who extended Flink. Best Joe [1] https://cwiki.apache.org/confluence/display/FLINK/1.14+Release > On 07.06.2021, at 05:30, Benchao Li wrote: > > Hi all, > > Thanks Xintong for bringing this up. > > I would like to share some experience of the usage of Flink in our company > (ByteDance). > > 1. We started building our SQL platform in mid 2019, using v1.9 blink > planner, and it's amazing. > Also we added many internal features which is still missing in this > version, including DDL/Computed Column/ > a lot of internal formats and connectors, and some other planner changes. > > 2. At early 2020, we plan to upgrade to v1.10. Before we finished > cherry-picking internal commits to v1.10, we found > that v1.11 is going to be released soon. Hence we decided to upgrade to > v1.11. > Till late 2020, we almost finished internal feature check-picking work. (It > takes us so long because we still adding new features > to our online version v1.9 at the same time) > > 3. Now > Although we tried a lot of work to reduce the overhead for our users to > upgrading from v1.9 to v1.11, this process is still slow, because: > a) All the connectors/formats properties changed (although we have a tool > for them to upgrade in one click, they still have a lot of learning cost) > b) The checkpoint cannot be upgraded > > 4. Future > We have 5000+ online SQL jobs and hundreds of commits, we do not plan to do > an upgrade in short term. > However v1.11 still lacks a lot of features, for example: > a) new UDF type inference does not support aggregate function > b) FLIP-27 new source interface cannot be used in SQL > We may need to to a lot of cherry-picking to our v1.11 > > So, from our point, longer release circle and more fully finished features > may benefit us a lot. > > > JING ZHANG 于2021年6月4日周五 下午6:02写道: > >> Hi all, >> >> @Xintong Song >> Thanks for reminding me, I would contact Jark to update the wiki page. >> >> Besides, I'd like to provide more inputs by sharing our experience about >> upgrading Internal version of Flink. >> >> Flink has been widely used in the production environment since 2018 in our >> company. Our internal version is far behind the latest stable version of >> the community by about 1 year. We upgraded the internal Flink version to >> 1.10 version in March last year, and we plan to upgrade directly to 1.13 >> next month (missed 1.11 and 1.12 versions). We wish to use the latest >> version as soon as possible. However, in fact we follow up with the >> community's latest stable release version almost once a year because >> upgrading to a new version is a time-consuming process. >> >> I list detailed works as follows. >> >> a. Before release new internal version >> 1) Required: Cherrypick internal features to the new Flink branch. A few >> features need to be redeveloped based on the new branch code base. >>BTW, The cost would be more and more heavy since we maintain more and >> more internal features in our internal version. >> 2) Optional: Some internal connectors need to adapt to the new API >> 3) Required: Surrounding products need to updated based on the new API, for >> example, Internal Flink SQL WEB development platform >> 4) Required: Regression tests >> >> b. After release, encourage users to upgrade existing jobs (Thousands of >> jobs) to the new version, User need some time to do : >> 1) Repackage jar for dataStream job >> 2) For critical jobs, users need to run jobs at the two versions at the >> same time for a while. Migrated to a new job only after comparing the >> data carefully. >> 3) Pure ETL SQL jobs are easy to bump up. But other Flink SQL jobs with >> stateful operators need extra efforts because Flink SQL Job does not >> support state compatibility yet. >> >> Best regards, >> JING ZHANG >> >> Prasanna kumar 于2021年6月4日周五 下午2:27写道: >> >>> Hi all, >>> >>> We are using Flink for our eventing system. Overall we are very happy >> with >>> the tech, documentation and community support and quick replies in mails. >>> >>> My last 1 year experience with versions. >>> >>> We were working on 1.10 initially during our research phase then we >>> stabilised with 1.11 as we moved on but by the time we are about to get >>> into production 1.12 was released. As with all software and products, >>> there were bugs reported. So we waited till 1.12.2 was released and then >>> upgraded. Within a month of us doing it 1.13 got released. >>> >>> But by past experience , we waited till at least a couple of minor >>> versions(fixing bugs)
Flink 1.14. Bi-weekly 2021-06-22
Hello, Last Tuesday was our second bi-weekly. You can read up the outcome in the confluence wiki page [1]. *Feature freeze date* As we didn't come to a clear agreement, we will keep the anticipated feature freeze date as it is at early August. *Build stability* The good thing: we decreased the number of issues, the not so good thing: only by ten. We as a community need to put further effort into this. *Dependencies* We'd like to ask all contributors to have a look at the components they are heavily Involved with to see if any dependencies require updating. There were some Issues recently to pass the security scans by some of the users. In future this should somehow be a default at the beginning of every release cycle. *Criteria for merging PRs* We want to avoid merging PRs with unrelated CI failures. We are quite aware that we need to raise the importance of the Docker caching issue. What can you do to make the Flink 1.14. release a good one: * Identify and update outdated dependencies * Get rid of test instabilities * Don't merge PRs including unrelated CI failures Best, Joe [1] https://cwiki.apache.org/confluence/display/FLINK/1.14+Release
[jira] [Created] (FLINK-23166) ZipUtils doesn't handle properly for softlinks inside the zip file
Dian Fu created FLINK-23166: --- Summary: ZipUtils doesn't handle properly for softlinks inside the zip file Key: FLINK-23166 URL: https://issues.apache.org/jira/browse/FLINK-23166 Project: Flink Issue Type: Bug Components: API / Python Affects Versions: 1.10.0 Reporter: Dian Fu Assignee: Dian Fu Fix For: 1.11.4, 1.14.0, 1.12.5, 1.13.2 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23165) Add StreamExecutionEnvironment#registerSlotSharingGroup to PyFlink
Yangze Guo created FLINK-23165: -- Summary: Add StreamExecutionEnvironment#registerSlotSharingGroup to PyFlink Key: FLINK-23165 URL: https://issues.apache.org/jira/browse/FLINK-23165 Project: Flink Issue Type: Sub-task Components: API / Python Reporter: Yangze Guo Fix For: 1.14.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)