[jira] [Created] (FLINK-23174) Log improvement in Task throws Error

2021-06-28 Thread Bo Cui (Jira)
Bo Cui created FLINK-23174:
--

 Summary: Log improvement in Task throws Error
 Key: FLINK-23174
 URL: https://issues.apache.org/jira/browse/FLINK-23174
 Project: Flink
  Issue Type: Improvement
Reporter: Bo Cui


we met some channels close due to network jitter and task fail.

we can only see which remote channel causes the task/job failure. 

but we can not know more details, such as which channel close, task stack...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-23173) Inconsistency detected by ld.so

2021-06-28 Thread Xintong Song (Jira)
Xintong Song created FLINK-23173:


 Summary: Inconsistency detected by ld.so
 Key: FLINK-23173
 URL: https://issues.apache.org/jira/browse/FLINK-23173
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.11.3
Reporter: Xintong Song


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=19647=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=2b7514ee-e706-5046-657b-3430666e7bd9=7236

The test fails because one of the TMs terminated unexpectedly.
The following error message is found in the stdout of the problematic TM.

{code}
Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: 
Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-23172) Links of restart strategy in configuration page is broken

2021-06-28 Thread Zhilong Hong (Jira)
Zhilong Hong created FLINK-23172:


 Summary: Links of restart strategy in configuration page is broken
 Key: FLINK-23172
 URL: https://issues.apache.org/jira/browse/FLINK-23172
 Project: Flink
  Issue Type: Technical Debt
  Components: Documentation
Affects Versions: 1.14.0
Reporter: Zhilong Hong
 Fix For: 1.14.0


The links in Fault Tolerance section of [the configuration 
page|https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/config/#fault-tolerance/]
 is broken. Currently the link refers to 
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/dev/task_failure_recovery.html#fixed-delay-restart-strategy,
 which doesn't exist and would head to 404 error. The correct link is 
https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/state/task_failure_recovery/#fixed-delay-restart-strategy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[DISCUSS] Better user experience in the WindowAggregate upon Changelog (contains update message)

2021-06-28 Thread JING ZHANG
When WindowAggregate works upon Changelog which contains update messages,
UPDATE BEFORE message may be dropped as a late message. [1]

In order to handle late UB message, user needs to set *all* the following 3
parameters:

(1) enable late fire by setting

table.exec.emit.late-fire.enabled : true

(2) set per record emit behavior for late records by setting

table.exec.emit.late-fire.delay : 0 s

(3) keep window state for extra time after window is fired by setting

table.exec.emit.allow-lateness : 1 h// 或者table.exec.state.ttl: 1h


The solution has two disadvantages:

(1) Users may not realize that UB messages may be dropped as a late event,
so they will not set related parameters.

(2) When users look for a solution to solve the dropped UB messages
problem, the current solution is a bit inconvenient for them because they
need to set all the 3 parameters. Besides, some configurations have overlap
ability.


Now there are two proposals to simplify the 3 parameters a little.

(1) Users only need set table.exec.emit.allow-lateness (just like the
behavior on Datastream, user only need set allow-lateness), framework could
atom set `table.exec.emit.late-fire.enabled` to true and set
`table.exec.emit.late-fire.delay` to 0s.

And in the later version, we deprecate `table.exec.emit.late-fire.delay`
and `table.exec.emit.late-fire.enabled`.


(2) Users need set `table.exec.emit.late-fire.enabled` to true and set
`table.exec.state.ttl`, framework  could atom set
`table.exec.emit.late-fire.delay` to 0s.

And in the later version, we deprecate `table.exec.emit.late-fire.delay`
and `table.exec.emit.allow-lateness `.


Please let me know what you think about the issue.

Thank you.

[1] https://issues.apache.org/jira/browse/FLINK-22781


Best regards,
JING ZHANG


Re: [VOTE] FLIP-147: Support Checkpoint After Tasks Finished

2021-06-28 Thread 刘建刚
+1 (binding)

Best
liujiangang

Piotr Nowojski  于2021年6月29日周二 上午2:05写道:

> +1 (binding)
>
> Piotrek
>
> pon., 28 cze 2021 o 12:48 Dawid Wysakowicz 
> napisał(a):
>
> > +1 (binding)
> >
> > Best,
> >
> > Dawid
> >
> > On 28/06/2021 10:45, Yun Gao wrote:
> > > Hi all,
> > >
> > > For FLIP-147[1] which targets at supports checkpoints after tasks
> > finished and modify operator
> > > API and implementation to ensures the commit of last piece of data,
> > since after the last vote
> > > we have more discussions[2][3] and a few updates, including changes to
> > PublicEvolving API,
> > > I'd like to have another VOTE on the current state of the FLIP.
> > >
> > > The vote will last at least 72 hours (Jul 1st), following the consensus
> > > voting process.
> > >
> > > thanks,
> > >  Yun
> > >
> > >
> > > [1] https://cwiki.apache.org/confluence/x/mw-ZCQ
> > > [2]
> >
> https://lists.apache.org/thread.html/r400da9898ff66fd613c25efea15de440a86f14758ceeae4950ea25cf%40%3Cdev.flink.apache.org
> > > [3]
> >
> https://lists.apache.org/thread.html/r3953df796ef5ac67d5be9f2251a95ad72efbca31f1d1555d13e71197%40%3Cdev.flink.apache.org%3E
> >
> >
>


Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-28 Thread 刘建刚
+1 for the proposal. Since the test time is long and environment may vary,
unstable tests are really annoying for developers. The solution is welcome.

Best
liujiangang

Jingsong Li  于2021年6月29日周二 上午10:31写道:

> +1 Thanks Xintong for the update!
>
> Best,
> Jingsong
>
> On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann 
> wrote:
>
> > +1, thanks for updating the guidelines Xintong!
> >
> > Cheers,
> > Till
> >
> > On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo  wrote:
> >
> > > +1
> > >
> > > Thanks Xintong for drafting this doc.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG 
> wrote:
> > > >
> > > > Thanks Xintong for giving detailed documentation.
> > > >
> > > > The best practice for handling test failure is very detailed, it's a
> > good
> > > > guidelines document with clear action steps.
> > > >
> > > > +1 to Xintong's proposal.
> > > >
> > > > Xintong Song  于2021年6月28日周一 下午4:07写道:
> > > >
> > > > > Thanks all for the discussion.
> > > > >
> > > > > Based on the opinions so far, I've drafted the new guidelines [1],
> > as a
> > > > > potential replacement of the original wiki page [2].
> > > > >
> > > > > Hopefully this draft has covered the most opinions discussed and
> > > consensus
> > > > > made in this discussion thread.
> > > > >
> > > > > Looking forward to your feedback.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > >
> >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > > > >
> > > > > [2]
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
> > pnowoj...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Thanks for the clarification Till. +1 for what you have written.
> > > > > >
> > > > > > Piotrek
> > > > > >
> > > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann 
> > > > > napisał(a):
> > > > > >
> > > > > > > One quick note for clarification. I don't have anything against
> > > builds
> > > > > > > running on your personal Azure account and this is not what I
> > > > > understood
> > > > > > > under "local environment". For me "local environment" means
> that
> > > > > someone
> > > > > > > runs the test locally on his machine and then says that the
> > > > > > > tests have passed locally.
> > > > > > >
> > > > > > > I do agree that there might be a conflict of interests if a PR
> > > author
> > > > > > > disables tests. Here I would argue that we don't have malignant
> > > > > > committers
> > > > > > > which means that every committer will probably first check the
> > > > > respective
> > > > > > > ticket for how often the test failed. Then I guess the next
> step
> > > would
> > > > > be
> > > > > > > to discuss on the ticket whether to disable it or not. And
> > finally,
> > > > > after
> > > > > > > reaching a consensus, it will be disabled. If we see someone
> > > abusing
> > > > > this
> > > > > > > policy, then we can still think about how to guard against it.
> > But,
> > > > > > > honestly, I have very rarely seen such a case. I am also ok to
> > > pull in
> > > > > > the
> > > > > > > release manager to make the final call if this resolves
> concerns.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
> > > pnowoj...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > +1 for the general idea, however I have concerns about a
> couple
> > > of
> > > > > > > details.
> > > > > > > >
> > > > > > > > > I would first try to not introduce the exception for local
> > > builds.
> > > > > > > > > It makes it quite hard for others to verify the build and
> to
> > > make
> > > > > > sure
> > > > > > > > that the right things were executed.
> > > > > > > >
> > > > > > > > I would counter Till's proposal to ignore local green builds.
> > If
> > > > > > > committer
> > > > > > > > is merging and closing a PR, with official azure failure, but
> > > there
> > > > > > was a
> > > > > > > > green build before or in local azure it's IMO enough to leave
> > the
> > > > > > > message:
> > > > > > > >
> > > > > > > > > Latest build failure is a known issue: FLINK-12345
> > > > > > > > > Green local build: URL
> > > > > > > >
> > > > > > > > This should address Till's concern about verification.
> > > > > > > >
> > > > > > > > On the other hand I have concerns about disabling tests.* It
> > > > > shouldn't
> > > > > > be
> > > > > > > > the PR author/committer that's disabling a test on his own,
> as
> > > > > that's a
> > > > > > > > conflict of interests*. I have however no problems with
> > disabling
> > > > > test
> > > > > > > > instabilities that were marked as "blockers" though, that
> > should
> > > work
> > > > > > > > pretty well. But the important thing here is to correctly
> judge
> > > > > bumping
> > > > > > > > 

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-28 Thread Jingsong Li
+1 Thanks Xintong for the update!

Best,
Jingsong

On Mon, Jun 28, 2021 at 6:44 PM Till Rohrmann  wrote:

> +1, thanks for updating the guidelines Xintong!
>
> Cheers,
> Till
>
> On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo  wrote:
>
> > +1
> >
> > Thanks Xintong for drafting this doc.
> >
> > Best,
> > Yangze Guo
> >
> > On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG  wrote:
> > >
> > > Thanks Xintong for giving detailed documentation.
> > >
> > > The best practice for handling test failure is very detailed, it's a
> good
> > > guidelines document with clear action steps.
> > >
> > > +1 to Xintong's proposal.
> > >
> > > Xintong Song  于2021年6月28日周一 下午4:07写道:
> > >
> > > > Thanks all for the discussion.
> > > >
> > > > Based on the opinions so far, I've drafted the new guidelines [1],
> as a
> > > > potential replacement of the original wiki page [2].
> > > >
> > > > Hopefully this draft has covered the most opinions discussed and
> > consensus
> > > > made in this discussion thread.
> > > >
> > > > Looking forward to your feedback.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > > >
> > > > [2]
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > > >
> > > >
> > > >
> > > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski <
> pnowoj...@apache.org>
> > > > wrote:
> > > >
> > > > > Thanks for the clarification Till. +1 for what you have written.
> > > > >
> > > > > Piotrek
> > > > >
> > > > > pt., 25 cze 2021 o 16:00 Till Rohrmann 
> > > > napisał(a):
> > > > >
> > > > > > One quick note for clarification. I don't have anything against
> > builds
> > > > > > running on your personal Azure account and this is not what I
> > > > understood
> > > > > > under "local environment". For me "local environment" means that
> > > > someone
> > > > > > runs the test locally on his machine and then says that the
> > > > > > tests have passed locally.
> > > > > >
> > > > > > I do agree that there might be a conflict of interests if a PR
> > author
> > > > > > disables tests. Here I would argue that we don't have malignant
> > > > > committers
> > > > > > which means that every committer will probably first check the
> > > > respective
> > > > > > ticket for how often the test failed. Then I guess the next step
> > would
> > > > be
> > > > > > to discuss on the ticket whether to disable it or not. And
> finally,
> > > > after
> > > > > > reaching a consensus, it will be disabled. If we see someone
> > abusing
> > > > this
> > > > > > policy, then we can still think about how to guard against it.
> But,
> > > > > > honestly, I have very rarely seen such a case. I am also ok to
> > pull in
> > > > > the
> > > > > > release manager to make the final call if this resolves concerns.
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
> > pnowoj...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > +1 for the general idea, however I have concerns about a couple
> > of
> > > > > > details.
> > > > > > >
> > > > > > > > I would first try to not introduce the exception for local
> > builds.
> > > > > > > > It makes it quite hard for others to verify the build and to
> > make
> > > > > sure
> > > > > > > that the right things were executed.
> > > > > > >
> > > > > > > I would counter Till's proposal to ignore local green builds.
> If
> > > > > > committer
> > > > > > > is merging and closing a PR, with official azure failure, but
> > there
> > > > > was a
> > > > > > > green build before or in local azure it's IMO enough to leave
> the
> > > > > > message:
> > > > > > >
> > > > > > > > Latest build failure is a known issue: FLINK-12345
> > > > > > > > Green local build: URL
> > > > > > >
> > > > > > > This should address Till's concern about verification.
> > > > > > >
> > > > > > > On the other hand I have concerns about disabling tests.* It
> > > > shouldn't
> > > > > be
> > > > > > > the PR author/committer that's disabling a test on his own, as
> > > > that's a
> > > > > > > conflict of interests*. I have however no problems with
> disabling
> > > > test
> > > > > > > instabilities that were marked as "blockers" though, that
> should
> > work
> > > > > > > pretty well. But the important thing here is to correctly judge
> > > > bumping
> > > > > > > priorities of test instabilities based on their frequency and
> > current
> > > > > > > general health of the system. I believe that release managers
> > should
> > > > be
> > > > > > > playing a big role here in deciding on the guidelines of what
> > should
> > > > > be a
> > > > > > > priority of certain test instabilities.
> > > > > > >
> > > > > > > What I mean by that is two example scenarios:
> > > > > > > 1. if we have a handful of very frequently failing tests and a
> > > > handful
> > > > > of
> > > > > > > very 

Re: [VOTE] FLIP-172: Support custom transactional.id prefix in FlinkKafkaProducer

2021-06-28 Thread Arvid Heise
+1 (binding)

On Mon, Jun 28, 2021 at 8:04 PM Piotr Nowojski  wrote:

> +1 (binding)
>
> Piotrek
>
> pon., 28 cze 2021 o 16:01 Wenhao Ji  napisał(a):
>
> > Hi everyone,
> >
> > I would like to start a vote on FLIP-172 [1] which was discussed in
> > this thread [2].
> > The vote will be open for at least 72 hours until July 1 unless there
> > is an objection or not enough votes.
> >
> > Thanks,
> > Wenhao
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-172%3A+Support+custom+transactional.id+prefix+in+FlinkKafkaProducer
> > [2]
> >
> https://lists.apache.org/thread.html/r67610aa2d4dfdaf3b027b82edd1a3f46771f0d58902a4258d931e5a5%40%3Cdev.flink.apache.org%3E
> >
>


Re: [VOTE] FLIP-147: Support Checkpoint After Tasks Finished

2021-06-28 Thread Piotr Nowojski
+1 (binding)

Piotrek

pon., 28 cze 2021 o 12:48 Dawid Wysakowicz 
napisał(a):

> +1 (binding)
>
> Best,
>
> Dawid
>
> On 28/06/2021 10:45, Yun Gao wrote:
> > Hi all,
> >
> > For FLIP-147[1] which targets at supports checkpoints after tasks
> finished and modify operator
> > API and implementation to ensures the commit of last piece of data,
> since after the last vote
> > we have more discussions[2][3] and a few updates, including changes to
> PublicEvolving API,
> > I'd like to have another VOTE on the current state of the FLIP.
> >
> > The vote will last at least 72 hours (Jul 1st), following the consensus
> > voting process.
> >
> > thanks,
> >  Yun
> >
> >
> > [1] https://cwiki.apache.org/confluence/x/mw-ZCQ
> > [2]
> https://lists.apache.org/thread.html/r400da9898ff66fd613c25efea15de440a86f14758ceeae4950ea25cf%40%3Cdev.flink.apache.org
> > [3]
> https://lists.apache.org/thread.html/r3953df796ef5ac67d5be9f2251a95ad72efbca31f1d1555d13e71197%40%3Cdev.flink.apache.org%3E
>
>


Re: [VOTE] FLIP-172: Support custom transactional.id prefix in FlinkKafkaProducer

2021-06-28 Thread Piotr Nowojski
+1 (binding)

Piotrek

pon., 28 cze 2021 o 16:01 Wenhao Ji  napisał(a):

> Hi everyone,
>
> I would like to start a vote on FLIP-172 [1] which was discussed in
> this thread [2].
> The vote will be open for at least 72 hours until July 1 unless there
> is an objection or not enough votes.
>
> Thanks,
> Wenhao
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-172%3A+Support+custom+transactional.id+prefix+in+FlinkKafkaProducer
> [2]
> https://lists.apache.org/thread.html/r67610aa2d4dfdaf3b027b82edd1a3f46771f0d58902a4258d931e5a5%40%3Cdev.flink.apache.org%3E
>


[jira] [Created] (FLINK-23171) Can't execute SET table.sql-dialect=hive;

2021-06-28 Thread JasonLee (Jira)
JasonLee created FLINK-23171:


 Summary: Can't execute SET table.sql-dialect=hive; 
 Key: FLINK-23171
 URL: https://issues.apache.org/jira/browse/FLINK-23171
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Affects Versions: 1.13.1
 Environment: Flink 1.13.1

hive 2.3.4

 
Reporter: JasonLee
 Fix For: 1.14.0


sql client throw an exception when I switch dialects like this 

SET table.sql-dialect=hive;

The exception is as follows:

 
{code:java}
// code placeholder
Exception in thread "main" org.apache.flink.table.client.SqlClientException: 
Unexpected exception. This is a bug. Please consider filing an issue.Exception 
in thread "main" org.apache.flink.table.client.SqlClientException: Unexpected 
exception. This is a bug. Please consider filing an issue. at 
org.apache.flink.table.client.SqlClient.startClient(SqlClient.java:201) at 
org.apache.flink.table.client.SqlClient.main(SqlClient.java:161)Caused by: 
java.lang.BootstrapMethodError: java.lang.NoSuchMethodError: 
org.apache.flink.table.planner.delegation.PlannerContext.createSqlExprToRexConverter(Lorg/apache/calcite/rel/type/RelDataType;)Lorg/apache/flink/table/planner/calcite/SqlExprToRexConverter;
 at 
org.apache.flink.table.planner.delegation.hive.HiveParserFactory.create(HiveParserFactory.java:39)
 at 
org.apache.flink.table.planner.delegation.PlannerBase.createNewParser(PlannerBase.scala:144)
 at 
org.apache.flink.table.planner.delegation.PlannerBase.getParser(PlannerBase.scala:149)
 at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.getParser(TableEnvironmentImpl.java:1466)
 at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.(TableEnvironmentImpl.java:237)
 at 
org.apache.flink.table.api.bridge.java.internal.StreamTableEnvironmentImpl.(StreamTableEnvironmentImpl.java:113)
 at 
org.apache.flink.table.client.gateway.context.ExecutionContext.createStreamTableEnvironment(ExecutionContext.java:156)
 at 
org.apache.flink.table.client.gateway.context.ExecutionContext.createTableEnvironment(ExecutionContext.java:116)
 at 
org.apache.flink.table.client.gateway.context.ExecutionContext.(ExecutionContext.java:82)
 at 
org.apache.flink.table.client.gateway.context.SessionContext.set(SessionContext.java:156)
 at 
org.apache.flink.table.client.gateway.local.LocalExecutor.setSessionProperty(LocalExecutor.java:164)
 at org.apache.flink.table.client.cli.CliClient.callSet(CliClient.java:456) at 
org.apache.flink.table.client.cli.CliClient.callOperation(CliClient.java:403) 
at 
org.apache.flink.table.client.cli.CliClient.lambda$executeStatement$0(CliClient.java:327)
 at java.util.Optional.ifPresent(Optional.java:159) at 
org.apache.flink.table.client.cli.CliClient.executeStatement(CliClient.java:327)
 at 
org.apache.flink.table.client.cli.CliClient.executeInteractive(CliClient.java:297)
 at 
org.apache.flink.table.client.cli.CliClient.executeInInteractiveMode(CliClient.java:221)
 at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:151) at 
org.apache.flink.table.client.SqlClient.start(SqlClient.java:95) at 
org.apache.flink.table.client.SqlClient.startClient(SqlClient.java:187) ... 1 
moreCaused by: java.lang.NoSuchMethodError: 
org.apache.flink.table.planner.delegation.PlannerContext.createSqlExprToRexConverter(Lorg/apache/calcite/rel/type/RelDataType;)Lorg/apache/flink/table/planner/calcite/SqlExprToRexConverter;
 at java.lang.invoke.MethodHandleNatives.resolve(Native Method) at 
java.lang.invoke.MemberName$Factory.resolve(MemberName.java:975) at 
java.lang.invoke.MemberName$Factory.resolveOrFail(MemberName.java:1000) at 
java.lang.invoke.MethodHandles$Lookup.resolveOrFail(MethodHandles.java:1389) at 
java.lang.invoke.MethodHandles$Lookup.linkMethodHandleConstant(MethodHandles.java:1745)
 at 
java.lang.invoke.MethodHandleNatives.linkMethodHandleConstant(MethodHandleNatives.java:477)
 ... 22 more
{code}
I guess there's a packet conflict, But I can execute it in version 1.13.0

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[VOTE] FLIP-172: Support custom transactional.id prefix in FlinkKafkaProducer

2021-06-28 Thread Wenhao Ji
Hi everyone,

I would like to start a vote on FLIP-172 [1] which was discussed in
this thread [2].
The vote will be open for at least 72 hours until July 1 unless there
is an objection or not enough votes.

Thanks,
Wenhao

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-172%3A+Support+custom+transactional.id+prefix+in+FlinkKafkaProducer
[2] 
https://lists.apache.org/thread.html/r67610aa2d4dfdaf3b027b82edd1a3f46771f0d58902a4258d931e5a5%40%3Cdev.flink.apache.org%3E


[jira] [Created] (FLINK-23170) Write metadata after materialization

2021-06-28 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-23170:
-

 Summary: Write metadata after materialization
 Key: FLINK-23170
 URL: https://issues.apache.org/jira/browse/FLINK-23170
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / State Backends
Reporter: Roman Khachatryan
 Fix For: 1.14.0


Currently, changelog state backend writes state metadata on first state access. 
It is written to the changelog
On materialization, the changelog can be truncated, so the metadata needs to be 
written again.

This can be achieved by resetting AbstractStateChangeLogger.metaDataWritten 
flag. 
It can be further optimized by storing the SQN at which the metadata was 
written and only resetting the flag if materializedSqn >= metadataSqn; but 
materialization is relatively rare so it probably doesn't worth it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-23169) Support user-level app staging directory when yarn.staging-directory is specified

2021-06-28 Thread jinfeng (Jira)
jinfeng created FLINK-23169:
---

 Summary: Support user-level app staging directory when 
yarn.staging-directory is specified
 Key: FLINK-23169
 URL: https://issues.apache.org/jira/browse/FLINK-23169
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Reporter: jinfeng


When yarn.staging-directory is specified,  different users will use the same 
directory as the staging directory.   It may not friendly for a job platform to 
submit job for different users.  I propose to use the user-level directory by 
default when yarn.staging-directory is specified.  We only need to make small 
changes  for `getStagingDir` function in 

YarnClusterDescriptor 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] FLIP-147: Support Checkpoint After Tasks Finished

2021-06-28 Thread Dawid Wysakowicz
+1 (binding)

Best,

Dawid

On 28/06/2021 10:45, Yun Gao wrote:
> Hi all,
>
> For FLIP-147[1] which targets at supports checkpoints after tasks finished 
> and modify operator
> API and implementation to ensures the commit of last piece of data, since 
> after the last vote 
> we have more discussions[2][3] and a few updates, including changes to 
> PublicEvolving API, 
> I'd like to have another VOTE on the current state of the FLIP. 
>
> The vote will last at least 72 hours (Jul 1st), following the consensus
> voting process.
>
> thanks,
>  Yun
>
>
> [1] https://cwiki.apache.org/confluence/x/mw-ZCQ
> [2] 
> https://lists.apache.org/thread.html/r400da9898ff66fd613c25efea15de440a86f14758ceeae4950ea25cf%40%3Cdev.flink.apache.org
> [3] 
> https://lists.apache.org/thread.html/r3953df796ef5ac67d5be9f2251a95ad72efbca31f1d1555d13e71197%40%3Cdev.flink.apache.org%3E



OpenPGP_signature
Description: OpenPGP digital signature


Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-28 Thread Till Rohrmann
+1, thanks for updating the guidelines Xintong!

Cheers,
Till

On Mon, Jun 28, 2021 at 11:49 AM Yangze Guo  wrote:

> +1
>
> Thanks Xintong for drafting this doc.
>
> Best,
> Yangze Guo
>
> On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG  wrote:
> >
> > Thanks Xintong for giving detailed documentation.
> >
> > The best practice for handling test failure is very detailed, it's a good
> > guidelines document with clear action steps.
> >
> > +1 to Xintong's proposal.
> >
> > Xintong Song  于2021年6月28日周一 下午4:07写道:
> >
> > > Thanks all for the discussion.
> > >
> > > Based on the opinions so far, I've drafted the new guidelines [1], as a
> > > potential replacement of the original wiki page [2].
> > >
> > > Hopefully this draft has covered the most opinions discussed and
> consensus
> > > made in this discussion thread.
> > >
> > > Looking forward to your feedback.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > > [1]
> > >
> > >
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> > >
> > > [2]
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> > >
> > >
> > >
> > > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski 
> > > wrote:
> > >
> > > > Thanks for the clarification Till. +1 for what you have written.
> > > >
> > > > Piotrek
> > > >
> > > > pt., 25 cze 2021 o 16:00 Till Rohrmann 
> > > napisał(a):
> > > >
> > > > > One quick note for clarification. I don't have anything against
> builds
> > > > > running on your personal Azure account and this is not what I
> > > understood
> > > > > under "local environment". For me "local environment" means that
> > > someone
> > > > > runs the test locally on his machine and then says that the
> > > > > tests have passed locally.
> > > > >
> > > > > I do agree that there might be a conflict of interests if a PR
> author
> > > > > disables tests. Here I would argue that we don't have malignant
> > > > committers
> > > > > which means that every committer will probably first check the
> > > respective
> > > > > ticket for how often the test failed. Then I guess the next step
> would
> > > be
> > > > > to discuss on the ticket whether to disable it or not. And finally,
> > > after
> > > > > reaching a consensus, it will be disabled. If we see someone
> abusing
> > > this
> > > > > policy, then we can still think about how to guard against it. But,
> > > > > honestly, I have very rarely seen such a case. I am also ok to
> pull in
> > > > the
> > > > > release manager to make the final call if this resolves concerns.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski <
> pnowoj...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > +1 for the general idea, however I have concerns about a couple
> of
> > > > > details.
> > > > > >
> > > > > > > I would first try to not introduce the exception for local
> builds.
> > > > > > > It makes it quite hard for others to verify the build and to
> make
> > > > sure
> > > > > > that the right things were executed.
> > > > > >
> > > > > > I would counter Till's proposal to ignore local green builds. If
> > > > > committer
> > > > > > is merging and closing a PR, with official azure failure, but
> there
> > > > was a
> > > > > > green build before or in local azure it's IMO enough to leave the
> > > > > message:
> > > > > >
> > > > > > > Latest build failure is a known issue: FLINK-12345
> > > > > > > Green local build: URL
> > > > > >
> > > > > > This should address Till's concern about verification.
> > > > > >
> > > > > > On the other hand I have concerns about disabling tests.* It
> > > shouldn't
> > > > be
> > > > > > the PR author/committer that's disabling a test on his own, as
> > > that's a
> > > > > > conflict of interests*. I have however no problems with disabling
> > > test
> > > > > > instabilities that were marked as "blockers" though, that should
> work
> > > > > > pretty well. But the important thing here is to correctly judge
> > > bumping
> > > > > > priorities of test instabilities based on their frequency and
> current
> > > > > > general health of the system. I believe that release managers
> should
> > > be
> > > > > > playing a big role here in deciding on the guidelines of what
> should
> > > > be a
> > > > > > priority of certain test instabilities.
> > > > > >
> > > > > > What I mean by that is two example scenarios:
> > > > > > 1. if we have a handful of very frequently failing tests and a
> > > handful
> > > > of
> > > > > > very rarely failing tests (like one reported failure and no
> another
> > > > > > occurrence in many months, and let's even say that the failure
> looks
> > > > like
> > > > > > infrastructure/network timeout), we should focus on the
> frequently
> > > > > failing
> > > > > > ones, and probably we are safe to ignore for the time being the
> rare
> > > > > issues
> > > > > > - at least until we deal with the most pressing ones.
> > > > > > 2. If 

Re: Flink 1.14. Bi-weekly 2021-06-22

2021-06-28 Thread Till Rohrmann
Thanks a lot for the update, Joe. This is very helpful!

Cheers,
Till

On Mon, Jun 28, 2021 at 10:10 AM Xintong Song  wrote:

> Thanks for the update, Joe.
>
> Thank you~
>
> Xintong Song
>
>
> On Mon, Jun 28, 2021 at 3:54 PM Johannes Moser 
> wrote:
>
> > Hello,
> >
> > Last Tuesday was our second bi-weekly.
> >
> > You can read up the outcome in the confluence wiki page [1].
> >
> > *Feature freeze date*
> > As we didn't come to a clear agreement, we will keep the anticipated
> > feature freeze date
> > as it is at early August.
> >
> > *Build stability*
> > The good thing: we decreased the number of issues, the not so good thing:
> > only by ten.
> > We as a community need to put further effort into this.
> >
> > *Dependencies*
> > We'd like to ask all contributors to have a look at the components they
> > are heavily
> > Involved with to see if any dependencies require updating. There were
> some
> > Issues recently to pass the security scans by some of the users. In
> future
> > this should
> > somehow be a default at the beginning of every release cycle.
> >
> > *Criteria for merging PRs*
> > We want to avoid merging PRs with unrelated CI failures. We are quite
> > aware that we
> > need to raise the importance of the Docker caching issue.
> >
> > What can you do to make the Flink 1.14. release a good one:
> > * Identify and update outdated dependencies
> > * Get rid of test instabilities
> > * Don't merge PRs including unrelated CI failures
> >
> > Best,
> > Joe
> >
> >
> > [1] https://cwiki.apache.org/confluence/display/FLINK/1.14+Release
>


Re: [DISCUSS] FLIP-172: Support custom transactional.id prefix in FlinkKafkaProducer

2021-06-28 Thread Stephan Ewen
Sounds good from my side, please go ahead.

On Fri, Jun 25, 2021 at 5:31 PM Wenhao Ji  wrote:

> Thanks Stephan and Piotr for your replies. It seems that there is no
> problem or concern about this feature. If there is no further
> objection, I will start a vote thread for FLIP-172.
>
> Thanks,
> Wenhao
>
> On Wed, Jun 23, 2021 at 3:41 PM Piotr Nowojski 
> wrote:
> >
> > Hi,
> >
> > +1 from my side on this idea. I do not see any problems that could be
> > caused by this change.
> >
> > Best,
> > Piotrek
> >
> > śr., 23 cze 2021 o 08:59 Stephan Ewen  napisał(a):
> >
> > > The motivation and the proposal sound good to me, +1 from my side.
> > >
> > > Would be good to have a quick opinion from someone who worked
> specifically
> > > with Kafka, maybe Becket or Piotr?
> > >
> > > Best,
> > > Stephan
> > >
> > >
> > > On Sat, Jun 12, 2021 at 9:50 AM Wenhao Ji 
> wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> I would like to open this discussion thread to take about the FLIP-172
> > >> <
> > >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-172%3A+Support+custom+transactional.id+prefix+in+FlinkKafkaProducer
> > >> >,
> > >> which aims to provide a way to support specifying a custom
> > >> transactional.id
> > >> in the FlinkKafkaProducer class.
> > >>
> > >> I am looking forwards to your feedback and suggestions!
> > >>
> > >> Thanks,
> > >> Wenhao
> > >>
> > >
>


[jira] [Created] (FLINK-23168) Catalog shouldn't merge properties for alter DB operation

2021-06-28 Thread Rui Li (Jira)
Rui Li created FLINK-23168:
--

 Summary: Catalog shouldn't merge properties for alter DB operation
 Key: FLINK-23168
 URL: https://issues.apache.org/jira/browse/FLINK-23168
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Hive, Table SQL / API
Reporter: Rui Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-28 Thread Yangze Guo
+1

Thanks Xintong for drafting this doc.

Best,
Yangze Guo

On Mon, Jun 28, 2021 at 5:42 PM JING ZHANG  wrote:
>
> Thanks Xintong for giving detailed documentation.
>
> The best practice for handling test failure is very detailed, it's a good
> guidelines document with clear action steps.
>
> +1 to Xintong's proposal.
>
> Xintong Song  于2021年6月28日周一 下午4:07写道:
>
> > Thanks all for the discussion.
> >
> > Based on the opinions so far, I've drafted the new guidelines [1], as a
> > potential replacement of the original wiki page [2].
> >
> > Hopefully this draft has covered the most opinions discussed and consensus
> > made in this discussion thread.
> >
> > Looking forward to your feedback.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> > [1]
> >
> > https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
> >
> > [2]
> > https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
> >
> >
> >
> > On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski 
> > wrote:
> >
> > > Thanks for the clarification Till. +1 for what you have written.
> > >
> > > Piotrek
> > >
> > > pt., 25 cze 2021 o 16:00 Till Rohrmann 
> > napisał(a):
> > >
> > > > One quick note for clarification. I don't have anything against builds
> > > > running on your personal Azure account and this is not what I
> > understood
> > > > under "local environment". For me "local environment" means that
> > someone
> > > > runs the test locally on his machine and then says that the
> > > > tests have passed locally.
> > > >
> > > > I do agree that there might be a conflict of interests if a PR author
> > > > disables tests. Here I would argue that we don't have malignant
> > > committers
> > > > which means that every committer will probably first check the
> > respective
> > > > ticket for how often the test failed. Then I guess the next step would
> > be
> > > > to discuss on the ticket whether to disable it or not. And finally,
> > after
> > > > reaching a consensus, it will be disabled. If we see someone abusing
> > this
> > > > policy, then we can still think about how to guard against it. But,
> > > > honestly, I have very rarely seen such a case. I am also ok to pull in
> > > the
> > > > release manager to make the final call if this resolves concerns.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski 
> > > > wrote:
> > > >
> > > > > +1 for the general idea, however I have concerns about a couple of
> > > > details.
> > > > >
> > > > > > I would first try to not introduce the exception for local builds.
> > > > > > It makes it quite hard for others to verify the build and to make
> > > sure
> > > > > that the right things were executed.
> > > > >
> > > > > I would counter Till's proposal to ignore local green builds. If
> > > > committer
> > > > > is merging and closing a PR, with official azure failure, but there
> > > was a
> > > > > green build before or in local azure it's IMO enough to leave the
> > > > message:
> > > > >
> > > > > > Latest build failure is a known issue: FLINK-12345
> > > > > > Green local build: URL
> > > > >
> > > > > This should address Till's concern about verification.
> > > > >
> > > > > On the other hand I have concerns about disabling tests.* It
> > shouldn't
> > > be
> > > > > the PR author/committer that's disabling a test on his own, as
> > that's a
> > > > > conflict of interests*. I have however no problems with disabling
> > test
> > > > > instabilities that were marked as "blockers" though, that should work
> > > > > pretty well. But the important thing here is to correctly judge
> > bumping
> > > > > priorities of test instabilities based on their frequency and current
> > > > > general health of the system. I believe that release managers should
> > be
> > > > > playing a big role here in deciding on the guidelines of what should
> > > be a
> > > > > priority of certain test instabilities.
> > > > >
> > > > > What I mean by that is two example scenarios:
> > > > > 1. if we have a handful of very frequently failing tests and a
> > handful
> > > of
> > > > > very rarely failing tests (like one reported failure and no another
> > > > > occurrence in many months, and let's even say that the failure looks
> > > like
> > > > > infrastructure/network timeout), we should focus on the frequently
> > > > failing
> > > > > ones, and probably we are safe to ignore for the time being the rare
> > > > issues
> > > > > - at least until we deal with the most pressing ones.
> > > > > 2. If we have tons of rarely failing test instabilities, we should
> > > > probably
> > > > > start addressing them as blockers.
> > > > >
> > > > > I'm using my own conscious and my best judgement when I'm
> > > > > bumping/decreasing priorities of test instabilities (and bugs), but
> > as
> > > > > individual committer I don't have the full picture. As I wrote
> > above, I
> > > > > think release managers are in a much better position to keep

Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-28 Thread JING ZHANG
Thanks Xintong for giving detailed documentation.

The best practice for handling test failure is very detailed, it's a good
guidelines document with clear action steps.

+1 to Xintong's proposal.

Xintong Song  于2021年6月28日周一 下午4:07写道:

> Thanks all for the discussion.
>
> Based on the opinions so far, I've drafted the new guidelines [1], as a
> potential replacement of the original wiki page [2].
>
> Hopefully this draft has covered the most opinions discussed and consensus
> made in this discussion thread.
>
> Looking forward to your feedback.
>
> Thank you~
>
> Xintong Song
>
>
> [1]
>
> https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing
>
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests
>
>
>
> On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski 
> wrote:
>
> > Thanks for the clarification Till. +1 for what you have written.
> >
> > Piotrek
> >
> > pt., 25 cze 2021 o 16:00 Till Rohrmann 
> napisał(a):
> >
> > > One quick note for clarification. I don't have anything against builds
> > > running on your personal Azure account and this is not what I
> understood
> > > under "local environment". For me "local environment" means that
> someone
> > > runs the test locally on his machine and then says that the
> > > tests have passed locally.
> > >
> > > I do agree that there might be a conflict of interests if a PR author
> > > disables tests. Here I would argue that we don't have malignant
> > committers
> > > which means that every committer will probably first check the
> respective
> > > ticket for how often the test failed. Then I guess the next step would
> be
> > > to discuss on the ticket whether to disable it or not. And finally,
> after
> > > reaching a consensus, it will be disabled. If we see someone abusing
> this
> > > policy, then we can still think about how to guard against it. But,
> > > honestly, I have very rarely seen such a case. I am also ok to pull in
> > the
> > > release manager to make the final call if this resolves concerns.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski 
> > > wrote:
> > >
> > > > +1 for the general idea, however I have concerns about a couple of
> > > details.
> > > >
> > > > > I would first try to not introduce the exception for local builds.
> > > > > It makes it quite hard for others to verify the build and to make
> > sure
> > > > that the right things were executed.
> > > >
> > > > I would counter Till's proposal to ignore local green builds. If
> > > committer
> > > > is merging and closing a PR, with official azure failure, but there
> > was a
> > > > green build before or in local azure it's IMO enough to leave the
> > > message:
> > > >
> > > > > Latest build failure is a known issue: FLINK-12345
> > > > > Green local build: URL
> > > >
> > > > This should address Till's concern about verification.
> > > >
> > > > On the other hand I have concerns about disabling tests.* It
> shouldn't
> > be
> > > > the PR author/committer that's disabling a test on his own, as
> that's a
> > > > conflict of interests*. I have however no problems with disabling
> test
> > > > instabilities that were marked as "blockers" though, that should work
> > > > pretty well. But the important thing here is to correctly judge
> bumping
> > > > priorities of test instabilities based on their frequency and current
> > > > general health of the system. I believe that release managers should
> be
> > > > playing a big role here in deciding on the guidelines of what should
> > be a
> > > > priority of certain test instabilities.
> > > >
> > > > What I mean by that is two example scenarios:
> > > > 1. if we have a handful of very frequently failing tests and a
> handful
> > of
> > > > very rarely failing tests (like one reported failure and no another
> > > > occurrence in many months, and let's even say that the failure looks
> > like
> > > > infrastructure/network timeout), we should focus on the frequently
> > > failing
> > > > ones, and probably we are safe to ignore for the time being the rare
> > > issues
> > > > - at least until we deal with the most pressing ones.
> > > > 2. If we have tons of rarely failing test instabilities, we should
> > > probably
> > > > start addressing them as blockers.
> > > >
> > > > I'm using my own conscious and my best judgement when I'm
> > > > bumping/decreasing priorities of test instabilities (and bugs), but
> as
> > > > individual committer I don't have the full picture. As I wrote
> above, I
> > > > think release managers are in a much better position to keep
> adjusting
> > > > those kind of guidelines.
> > > >
> > > > Best, Piotrek
> > > >
> > > > pt., 25 cze 2021 o 08:10 Yu Li  napisał(a):
> > > >
> > > > > +1 for Xintong's proposal.
> > > > >
> > > > > For me, resolving problems directly (fixing the infrastructure
> issue,
> > > > > disabling unstable tests and creating blocker JIRAs to track the
> fix
> > > and
> > > > > 

[jira] [Created] (FLINK-23167) Port Kinesis Table API e2e tests to release-1.12 branch

2021-06-28 Thread Emre Kartoglu (Jira)
Emre Kartoglu created FLINK-23167:
-

 Summary: Port Kinesis Table API e2e tests to release-1.12 branch
 Key: FLINK-23167
 URL: https://issues.apache.org/jira/browse/FLINK-23167
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Kinesis
Affects Versions: 1.12.6
Reporter: Emre Kartoglu


https://issues.apache.org/jira/browse/FLINK-20042 added e2e tests for the 
Kinesis Table API. This was only done for versions >=1.13 however.

We need to port these tests to the release-1.12 branch as version 1.12 supports 
the same functionality that needs the same (or a similar) test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[VOTE] FLIP-147: Support Checkpoint After Tasks Finished

2021-06-28 Thread Yun Gao
Hi all,

For FLIP-147[1] which targets at supports checkpoints after tasks finished and 
modify operator
API and implementation to ensures the commit of last piece of data, since after 
the last vote 
we have more discussions[2][3] and a few updates, including changes to 
PublicEvolving API, 
I'd like to have another VOTE on the current state of the FLIP. 

The vote will last at least 72 hours (Jul 1st), following the consensus
voting process.

thanks,
 Yun


[1] https://cwiki.apache.org/confluence/x/mw-ZCQ
[2] 
https://lists.apache.org/thread.html/r400da9898ff66fd613c25efea15de440a86f14758ceeae4950ea25cf%40%3Cdev.flink.apache.org
[3] 
https://lists.apache.org/thread.html/r3953df796ef5ac67d5be9f2251a95ad72efbca31f1d1555d13e71197%40%3Cdev.flink.apache.org%3E

Re: Flink 1.14. Bi-weekly 2021-06-22

2021-06-28 Thread Xintong Song
Thanks for the update, Joe.

Thank you~

Xintong Song


On Mon, Jun 28, 2021 at 3:54 PM Johannes Moser 
wrote:

> Hello,
>
> Last Tuesday was our second bi-weekly.
>
> You can read up the outcome in the confluence wiki page [1].
>
> *Feature freeze date*
> As we didn't come to a clear agreement, we will keep the anticipated
> feature freeze date
> as it is at early August.
>
> *Build stability*
> The good thing: we decreased the number of issues, the not so good thing:
> only by ten.
> We as a community need to put further effort into this.
>
> *Dependencies*
> We'd like to ask all contributors to have a look at the components they
> are heavily
> Involved with to see if any dependencies require updating. There were some
> Issues recently to pass the security scans by some of the users. In future
> this should
> somehow be a default at the beginning of every release cycle.
>
> *Criteria for merging PRs*
> We want to avoid merging PRs with unrelated CI failures. We are quite
> aware that we
> need to raise the importance of the Docker caching issue.
>
> What can you do to make the Flink 1.14. release a good one:
> * Identify and update outdated dependencies
> * Get rid of test instabilities
> * Don't merge PRs including unrelated CI failures
>
> Best,
> Joe
>
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/1.14+Release


Re: [DISCUSS] Do not merge PRs with "unrelated" test failures.

2021-06-28 Thread Xintong Song
Thanks all for the discussion.

Based on the opinions so far, I've drafted the new guidelines [1], as a
potential replacement of the original wiki page [2].

Hopefully this draft has covered the most opinions discussed and consensus
made in this discussion thread.

Looking forward to your feedback.

Thank you~

Xintong Song


[1]
https://docs.google.com/document/d/1uUbxbgbGErBXtmEjhwVhBWG3i6nhQ0LXs96OlntEYnU/edit?usp=sharing

[2] https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests



On Fri, Jun 25, 2021 at 10:40 PM Piotr Nowojski 
wrote:

> Thanks for the clarification Till. +1 for what you have written.
>
> Piotrek
>
> pt., 25 cze 2021 o 16:00 Till Rohrmann  napisał(a):
>
> > One quick note for clarification. I don't have anything against builds
> > running on your personal Azure account and this is not what I understood
> > under "local environment". For me "local environment" means that someone
> > runs the test locally on his machine and then says that the
> > tests have passed locally.
> >
> > I do agree that there might be a conflict of interests if a PR author
> > disables tests. Here I would argue that we don't have malignant
> committers
> > which means that every committer will probably first check the respective
> > ticket for how often the test failed. Then I guess the next step would be
> > to discuss on the ticket whether to disable it or not. And finally, after
> > reaching a consensus, it will be disabled. If we see someone abusing this
> > policy, then we can still think about how to guard against it. But,
> > honestly, I have very rarely seen such a case. I am also ok to pull in
> the
> > release manager to make the final call if this resolves concerns.
> >
> > Cheers,
> > Till
> >
> > On Fri, Jun 25, 2021 at 9:07 AM Piotr Nowojski 
> > wrote:
> >
> > > +1 for the general idea, however I have concerns about a couple of
> > details.
> > >
> > > > I would first try to not introduce the exception for local builds.
> > > > It makes it quite hard for others to verify the build and to make
> sure
> > > that the right things were executed.
> > >
> > > I would counter Till's proposal to ignore local green builds. If
> > committer
> > > is merging and closing a PR, with official azure failure, but there
> was a
> > > green build before or in local azure it's IMO enough to leave the
> > message:
> > >
> > > > Latest build failure is a known issue: FLINK-12345
> > > > Green local build: URL
> > >
> > > This should address Till's concern about verification.
> > >
> > > On the other hand I have concerns about disabling tests.* It shouldn't
> be
> > > the PR author/committer that's disabling a test on his own, as that's a
> > > conflict of interests*. I have however no problems with disabling test
> > > instabilities that were marked as "blockers" though, that should work
> > > pretty well. But the important thing here is to correctly judge bumping
> > > priorities of test instabilities based on their frequency and current
> > > general health of the system. I believe that release managers should be
> > > playing a big role here in deciding on the guidelines of what should
> be a
> > > priority of certain test instabilities.
> > >
> > > What I mean by that is two example scenarios:
> > > 1. if we have a handful of very frequently failing tests and a handful
> of
> > > very rarely failing tests (like one reported failure and no another
> > > occurrence in many months, and let's even say that the failure looks
> like
> > > infrastructure/network timeout), we should focus on the frequently
> > failing
> > > ones, and probably we are safe to ignore for the time being the rare
> > issues
> > > - at least until we deal with the most pressing ones.
> > > 2. If we have tons of rarely failing test instabilities, we should
> > probably
> > > start addressing them as blockers.
> > >
> > > I'm using my own conscious and my best judgement when I'm
> > > bumping/decreasing priorities of test instabilities (and bugs), but as
> > > individual committer I don't have the full picture. As I wrote above, I
> > > think release managers are in a much better position to keep adjusting
> > > those kind of guidelines.
> > >
> > > Best, Piotrek
> > >
> > > pt., 25 cze 2021 o 08:10 Yu Li  napisał(a):
> > >
> > > > +1 for Xintong's proposal.
> > > >
> > > > For me, resolving problems directly (fixing the infrastructure issue,
> > > > disabling unstable tests and creating blocker JIRAs to track the fix
> > and
> > > > re-enable them asap, etc.) is (in most cases) better than working
> > around
> > > > them (verify locally, manually check and judge the failure as
> > > "unrelated",
> > > > etc.), and I believe the proposal could help us pushing those more
> > "real"
> > > > solutions forward.
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > >
> > > > On Fri, 25 Jun 2021 at 10:58, Yangze Guo  wrote:
> > > >
> > > > > Creating a blocker issue for the manually disabled tests sounds
> good
> > to
> > > > me.

Re: [Discuss] Planning Flink 1.14

2021-06-28 Thread Johannes Moser
Hi all,

We discussed the matter again in our latest release planning (see [1]). We see 
a lot of valid
points in this thread. As we were not able to come to a clear conclusion within 
the meeting and 
most of the arguments mentioned will still be valid even if we extend the 
feature freeze by
a month. We are keeping this for now at early August. I will collect all the 
inputs and talk to
some users to further improve the experience also for those who extended Flink.

Best Joe


[1] https://cwiki.apache.org/confluence/display/FLINK/1.14+Release

> On 07.06.2021, at 05:30, Benchao Li  wrote:
> 
> Hi all,
> 
> Thanks Xintong for bringing this up.
> 
> I would like to share some experience of the usage of Flink in our company
> (ByteDance).
> 
> 1. We started building our SQL platform in mid 2019, using v1.9 blink
> planner, and it's amazing.
> Also we added many internal features which is still missing in this
> version, including DDL/Computed Column/
> a lot of internal formats and connectors, and some other planner changes.
> 
> 2. At early 2020, we plan to upgrade to v1.10. Before we finished
> cherry-picking internal commits to v1.10, we found
> that v1.11 is going to be released soon. Hence we decided to upgrade to
> v1.11.
> Till late 2020, we almost finished internal feature check-picking work. (It
> takes us so long because we still adding new features
> to our online version v1.9 at the same time)
> 
> 3. Now
> Although we tried a lot of work to reduce the overhead for our users to
> upgrading from v1.9 to v1.11, this process is still slow, because:
> a) All the connectors/formats properties changed (although we have a tool
> for them to upgrade in one click, they still have a lot of learning cost)
> b) The checkpoint cannot be upgraded
> 
> 4. Future
> We have 5000+ online SQL jobs and hundreds of commits, we do not plan to do
> an upgrade in short term.
> However v1.11 still lacks a lot of features, for example:
> a) new UDF type inference does not support aggregate function
> b) FLIP-27 new source interface cannot be used in SQL
> We may need to to a lot of cherry-picking to our v1.11
> 
> So, from our point, longer release circle and more fully finished features
> may benefit us a lot.
> 
> 
> JING ZHANG  于2021年6月4日周五 下午6:02写道:
> 
>> Hi all,
>> 
>> @Xintong Song
>> Thanks for reminding me, I would contact Jark to update the wiki page.
>> 
>> Besides, I'd like to provide more inputs by sharing our experience about
>> upgrading Internal version of Flink.
>> 
>> Flink has been widely used in the production environment since 2018 in our
>> company. Our internal version is far behind the latest stable version of
>> the community by about 1 year. We upgraded the internal Flink version to
>> 1.10 version in March last year, and we plan to upgrade directly to 1.13
>> next month (missed 1.11 and 1.12 versions). We wish to use the latest
>> version as soon as possible. However, in fact we follow up with the
>> community's latest stable release version almost once a year because
>> upgrading to a new version is a time-consuming process.
>> 
>> I list detailed works as follows.
>> 
>> a. Before release new internal version
>> 1) Required: Cherrypick internal features to the new Flink branch. A few
>> features need to be redeveloped based on the new branch code base.
>>BTW, The cost would be more and more heavy since we maintain more and
>> more internal features in our internal version.
>> 2) Optional: Some internal connectors need to adapt to the new API
>> 3) Required: Surrounding products need to updated based on the new API, for
>> example, Internal Flink SQL WEB development platform
>> 4) Required: Regression tests
>> 
>> b. After release, encourage users to upgrade existing jobs (Thousands of
>> jobs) to the new version, User need some time to do :
>> 1) Repackage jar for dataStream job
>> 2) For critical jobs, users need to run jobs at the two versions at the
>> same time for a while. Migrated to a new job only after comparing the
>> data carefully.
>> 3) Pure ETL SQL jobs are easy to bump up. But other Flink SQL jobs with
>> stateful operators need extra efforts because Flink SQL Job does not
>> support state compatibility yet.
>> 
>> Best regards,
>> JING ZHANG
>> 
>> Prasanna kumar  于2021年6月4日周五 下午2:27写道:
>> 
>>> Hi all,
>>> 
>>> We are using Flink for our eventing system. Overall we are very happy
>> with
>>> the tech, documentation and community support and quick replies in mails.
>>> 
>>> My last 1 year experience with versions.
>>> 
>>> We were working on 1.10 initially during our research phase then we
>>> stabilised with 1.11 as we moved on but by the time we are about to get
>>> into production 1.12 was released. As with all software and products,
>>> there were bugs reported. So we waited till 1.12.2 was released and then
>>> upgraded. Within a month of us doing it 1.13 got released.
>>> 
>>> But by past experience , we waited till at least a couple of minor
>>> versions(fixing bugs) 

Flink 1.14. Bi-weekly 2021-06-22

2021-06-28 Thread Johannes Moser
Hello,

Last Tuesday was our second bi-weekly.

You can read up the outcome in the confluence wiki page [1].

*Feature freeze date*
As we didn't come to a clear agreement, we will keep the anticipated feature 
freeze date
as it is at early August.

*Build stability*
The good thing: we decreased the number of issues, the not so good thing: only 
by ten.
We as a community need to put further effort into this.

*Dependencies*
We'd like to ask all contributors to have a look at the components they are 
heavily 
Involved with to see if any dependencies require updating. There were some 
Issues recently to pass the security scans by some of the users. In future this 
should
somehow be a default at the beginning of every release cycle.

*Criteria for merging PRs*
We want to avoid merging PRs with unrelated CI failures. We are quite aware 
that we
need to raise the importance of the Docker caching issue.

What can you do to make the Flink 1.14. release a good one:
* Identify and update outdated dependencies
* Get rid of test instabilities
* Don't merge PRs including unrelated CI failures

Best,
Joe


[1] https://cwiki.apache.org/confluence/display/FLINK/1.14+Release

[jira] [Created] (FLINK-23166) ZipUtils doesn't handle properly for softlinks inside the zip file

2021-06-28 Thread Dian Fu (Jira)
Dian Fu created FLINK-23166:
---

 Summary: ZipUtils doesn't handle properly for softlinks inside the 
zip file
 Key: FLINK-23166
 URL: https://issues.apache.org/jira/browse/FLINK-23166
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.10.0
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.11.4, 1.14.0, 1.12.5, 1.13.2






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-23165) Add StreamExecutionEnvironment#registerSlotSharingGroup to PyFlink

2021-06-28 Thread Yangze Guo (Jira)
Yangze Guo created FLINK-23165:
--

 Summary: Add StreamExecutionEnvironment#registerSlotSharingGroup 
to PyFlink
 Key: FLINK-23165
 URL: https://issues.apache.org/jira/browse/FLINK-23165
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Reporter: Yangze Guo
 Fix For: 1.14.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)