[jira] [Created] (FLINK-21631) Add CountTrigger for Python group window aggregation

2021-03-04 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-21631:


 Summary: Add CountTrigger for Python group window aggregation
 Key: FLINK-21631
 URL: https://issues.apache.org/jira/browse/FLINK-21631
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Affects Versions: 1.13.0
Reporter: Huang Xingbo
Assignee: Huang Xingbo
 Fix For: 1.13.0


Add CountTrigger for Python group window aggregation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21630) Support Python UDAF in Session Window

2021-03-04 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-21630:


 Summary: Support Python UDAF in Session Window
 Key: FLINK-21630
 URL: https://issues.apache.org/jira/browse/FLINK-21630
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Affects Versions: 1.13.0
Reporter: Huang Xingbo
Assignee: Huang Xingbo
 Fix For: 1.13.0


Support Python UDAF in Session Window



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21629) Support Python UDAF in Sliding Window

2021-03-04 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-21629:


 Summary: Support Python UDAF in Sliding Window
 Key: FLINK-21629
 URL: https://issues.apache.org/jira/browse/FLINK-21629
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Affects Versions: 1.13.0
Reporter: Huang Xingbo
Assignee: Huang Xingbo
 Fix For: 1.13.0


Support Python UDAF in Sliding Window



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21628) Support Python UDAF in Tumbling Window

2021-03-04 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-21628:


 Summary: Support Python UDAF in Tumbling Window
 Key: FLINK-21628
 URL: https://issues.apache.org/jira/browse/FLINK-21628
 Project: Flink
  Issue Type: Sub-task
  Components: API / Python
Affects Versions: 1.13.0
Reporter: Huang Xingbo
Assignee: Huang Xingbo
 Fix For: 1.13.0


Support Python UDAF in Tumbling Window



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21627) When you insert multiple inserts with statementSet, you modify multiple inserts with OPTIONS('table-name '=' XXX '), but only the first one takes effect

2021-03-04 Thread wangtaiyang (Jira)
wangtaiyang created FLINK-21627:
---

 Summary: When you insert multiple inserts with statementSet, you 
modify multiple inserts with OPTIONS('table-name '=' XXX '), but only the first 
one takes effect
 Key: FLINK-21627
 URL: https://issues.apache.org/jira/browse/FLINK-21627
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.12.0
Reporter: wangtaiyang


{code:java}
//代码占位符
StatementSet statementSet = tableEnvironment.createStatementSet();
String sql1 = "insert into test select a,b,c from test_a_12342 /*+
OPTIONS('table-name'='test_a_1')*/";
String sql2 = "insert into test select a,b,c from test_a_12342 /*+
OPTIONS('table-name'='test_a_2')*/";
statementSet.addInsertSql(sql1);
statementSet.addInsertSql(sql2);
statementSet.execute();
{code}
Sql code as above, in the final after the insert is put test_a_1 table data 
into the two times, and test_a_2 data did not insert, is excuse me this bug



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21626) Consider shaping newly introduced RuntimeContext.getJobId to return JobID with no Optional wrapper

2021-03-04 Thread Kezhu Wang (Jira)
Kezhu Wang created FLINK-21626:
--

 Summary: Consider shaping newly introduced RuntimeContext.getJobId 
to return JobID with no Optional wrapper
 Key: FLINK-21626
 URL: https://issues.apache.org/jira/browse/FLINK-21626
 Project: Flink
  Issue Type: Improvement
  Components: API / Core
Affects Versions: 1.13.0
Reporter: Kezhu Wang


Currently, this newly introduced {{RuntimeContext.getJobId()}} returns 
{{Optional}}. The only path where it returns no job id is 
{{RuntimeUDFContext}}(through {{CollectionExecutor}} through 
{{CollectionEnvironment}}).

But after {{DataSet}} dropped, there will be no paths to return no job id. Both 
FLINK-21581 and [my 
comment|https://github.com/apache/flink/pull/15053#issuecomment-789410967] 
raised this concern. But different with FLINK-21581, I think we could return an 
environment/executor/plan level unique job id in {{RuntimeUDFContext}} for this 
new api. This way there will be no breaking change after {{DataSet}} dropped. 
And more importantly, a careful chosen job id does not hurt callers of 
{{RuntimeUDFContext}} in my opinion.

cc  [~chesnay] [~roman_khachatryan] [~aljoscha] [~sewen] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[DISCUSS] FLIP-162 follow-up discussion

2021-03-04 Thread Leonard Xu
Hi, all

As the FLIP-162 discussed,  we agreed current time functions’ behavior is 
incorrect and plan to introduce the option 
table.exec.fallback-legacy-time-function to enable user fallback to incorrect 
behavior.

(1) The option is convenient for users who want to upgrade to 1.13 but don't 
want to change their sql job, user need to config the option value, this is the 
first time users influenced by these wrong functions.

(2) But we didn’t consider that the option will be deleted after one or two 
major versions, users have to change their sql job again at that time point, 
this the second time users influenced by these wrong functions.

(3) Besides, maintaining two sets of functions is prone to bugs.

I’ve discussed with some community developers offline, they tend to solve these 
functions at once i.e. Correct the wrong functions directly and do not 
introduce this option.

Considering that we will delete the configuration eventually,  comparing 
hurting users twice and bothering them for a long time, I would rather hurt 
users once.
Thus I also +1 that we should directly correct these wrong functions and remove 
the wrong functions at the same time.


If we can make a consensus in this thread, I think we can remove this option 
support in FLIP-162.
How do you think?

Best,
Leonard






[jira] [Created] (FLINK-21625) 5.03.2021 Benchmarks are not compiling

2021-03-04 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-21625:
--

 Summary: 5.03.2021 Benchmarks are not compiling
 Key: FLINK-21625
 URL: https://issues.apache.org/jira/browse/FLINK-21625
 Project: Flink
  Issue Type: Bug
  Components: Benchmarks, Runtime / Coordination
Affects Versions: 1.13.0
Reporter: Piotr Nowojski


http://codespeed.dak8s.net:8080/job/flink-master-benchmarks/7497/console


{code:java}
08:46:27  [ERROR] 
/home/jenkins/workspace/flink-master-benchmarks/flink-benchmarks/src/main/java/org/apache/flink/scheduler/benchmark/SchedulerBenchmarkUtils.java:152:32:
  error: incompatible types: JobID cannot be converted to ExecutionAttemptID
08:46:27  [ERROR] 
/home/jenkins/workspace/flink-master-benchmarks/flink-benchmarks/src/main/java/org/apache/flink/scheduler/benchmark/SchedulerBenchmarkUtils.java:169:44:
  error: incompatible types: JobID cannot be converted to ExecutionAttemptID
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21624) Correct FLOOR/CEIL (TIMESTAMP/TIMESTAMP_LTZ/DATE TO WEEK) functions

2021-03-04 Thread Leonard Xu (Jira)
Leonard Xu created FLINK-21624:
--

 Summary: Correct FLOOR/CEIL (TIMESTAMP/TIMESTAMP_LTZ/DATE TO WEEK) 
functions
 Key: FLINK-21624
 URL: https://issues.apache.org/jira/browse/FLINK-21624
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API, Table SQL / Planner
Reporter: Leonard Xu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21623) Introduce CURRENT_ROW_TIMESTAMP function

2021-03-04 Thread Leonard Xu (Jira)
Leonard Xu created FLINK-21623:
--

 Summary: Introduce CURRENT_ROW_TIMESTAMP function
 Key: FLINK-21623
 URL: https://issues.apache.org/jira/browse/FLINK-21623
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API, Table SQL / Planner
Reporter: Leonard Xu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21622) Introduce function TO_TIMESTAMP_LTZ(numeric [, precision])

2021-03-04 Thread Leonard Xu (Jira)
Leonard Xu created FLINK-21622:
--

 Summary:  Introduce function TO_TIMESTAMP_LTZ(numeric [, 
precision])
 Key: FLINK-21622
 URL: https://issues.apache.org/jira/browse/FLINK-21622
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API, Table SQL / Planner
Reporter: Leonard Xu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21621) Support TIMESTAMP_LTZ arithmetic

2021-03-04 Thread Leonard Xu (Jira)
Leonard Xu created FLINK-21621:
--

 Summary: Support TIMESTAMP_LTZ arithmetic
 Key: FLINK-21621
 URL: https://issues.apache.org/jira/browse/FLINK-21621
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Leonard Xu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21620) [table-planner/table-parser] Support abbreviation TIMESTAMP_LTZ FOR TIMESTAMP WITH LOCAL TIME ZONE

2021-03-04 Thread Leonard Xu (Jira)
Leonard Xu created FLINK-21620:
--

 Summary: [table-planner/table-parser] Support abbreviation 
TIMESTAMP_LTZ FOR TIMESTAMP WITH LOCAL TIME ZONE
 Key: FLINK-21620
 URL: https://issues.apache.org/jira/browse/FLINK-21620
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API, Table SQL / Planner
Reporter: Leonard Xu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-03-04 Thread Yun Gao
Hi Piotr,

Very thanks for the suggestions and thoughts!

> Is this a problem? Pipeline would be empty, so EndOfPartitionEvent would be 
> traveling very quickly.

No, this is not a problem, sorry I have some wrong thoughts here, initially in 
fact I'm thinking on this issue raised by 
@kezhu:

> Besides this, will FLIP-147 eventually need some ways to decide whether an 
> operator need final checkpoint
 @Yun @Guowei ?  @Arvid mentions this in earlier mail.

For this issue itself, I'm still lean towards we might still need it, for 
example, suppose we have a job that 
do not need to commit anything on finished, then it do not need to wait for 
checkpoint at all for normal
finish case.

> Yes, but aren't we doing it right now anyway? 
> `StreamSource#advanceToEndOfEventTime`?

Yes, we indeed have advancedEndOfEventTime for both legacy and new sources, 
sorry for the overlook.

> Is this the plan? That upon recovery we are restarting all operators, even 
> those that have already finished? 
Certainly it's one of the possibilities.

For the first version we would tend to use this way since it is easier to 
implement, and we should always need 
to consider the case that tasks are started but operators are finished since 
there might be also tasks with part 
of operators finished. For the long run I think we could continue to optimize 
it via not restart the finished tasks 
at all.

> Keep in mind that those are two separate things, as I mentioned in a previous 
> e-mail:
> > II. When should the `GlobalCommitHandle` be created? Should it be returned 
> > from `snapshotState()`, `notifyCheckpointComplete()` or somewhere else?> > 
> > III. What should be the ordering guarantee between global commit and local 
> > commit, if any? Actually the easiest to implement would be undefined, but 
> > de facto global commit happening before local commits (first invoke > 
> > `notifyCheckpointComplete()` on the `OperatorCoordinator` and either after 
> > or in parallel send `notifyCheckpointComplete()` RPCs). As far as I can 
> > tell, undefined order should work for the use cases that I'm aware of.
>
> We could create the `GlobalCommitHandle` in `StreamOperator#snapshotState()`, 
> while we could also ensure that `notifyCheckpointComplete()` is called on the 
> `OperatorCoordinator` AFTER all of the operators have successfully > 
> processed `notifyCheckpointComplete()`. This would be more difficult to 
> implement, hence I would prefer "undefined" behaviour here, but it's probably 
> possible.

Very thanks for the further explanation, and I also totally agree with that the 
two are separate and we could think on them 
distinctly. Regarding the order, I would still tend to we support the ordered 
case, since the sinks' implementation seem to depend
on this functionality.

Best,
 Yun


--
From:Piotr Nowojski 
Send Time:2021 Mar. 4 (Thu.) 22:56
To:Kezhu Wang 
Cc:Till Rohrmann ; Guowei Ma ; dev 
; Yun Gao ; jingsongl...@gmail.com 

Subject:Re: Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

Hi Yun and Kezhu,

> 1. We might introduce a new type of event to notify the endOfInput() though 
> the graph first, and then A/B waits for the finalcheckpoint, then A emit 
> EndOfPartitionEvent to exit all the tasks as now.

As I mentioned in one of the PRs, I would opt for this solution.

>  if we go towards 1, the tasks would still need to exit from the source, and 
> if we go towards 2/3, we could be able to allow thesetasks to finish first.

Is this a problem? Pipeline would be empty, so EndOfPartitionEvent would be 
traveling very quickly.

> should we also need to do it for normal exit 

Yes, but aren't we doing it right now anyway? 
`StreamSource#advanceToEndOfEventTime`? 

> If so, since now for recovery after some tasks finished we would first start 
> all the tasks and stop the finished tasks directly

Is this the plan? That upon recovery we are restarting all operators, even 
those that have already finished? Certainly it's one of the possibilities.

> For example, the global committer handler might be write meta store for 
> FileSystem/Hive sinks, and these should happen after all the pending
> files are renamed to the final ones, otherwise the downstream jobs might miss 
> some files if they relies on the meta store to identify ready partitions.
> Thus we would have to emit the global-committer-handler after 
> notifyCheckpointComplete. But since we could be able to know the list of files
> to rename in snapshotState(), we could create the global-committer-handler 
> and store them there.

Keep in mind that those are two separate things, as I mentioned in a previous 
e-mail:
> II. When should the `GlobalCommitHandle` be created? Should it be returned 
> from `snapshotState()`, `notifyCheckpointComplete()` or somewhere else?> III. 
> What should be the ordering guarantee between global commit and local commit, 
> if any? Actually the easiest 

[jira] [Created] (FLINK-21619) Remove the optional host and webui-port in jobmanager.sh and use common args instead

2021-03-04 Thread Yang Wang (Jira)
Yang Wang created FLINK-21619:
-

 Summary: Remove the optional host and webui-port in jobmanager.sh 
and use common args instead
 Key: FLINK-21619
 URL: https://issues.apache.org/jira/browse/FLINK-21619
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Scripts
Reporter: Yang Wang


Currently, the {{jobmanager.sh}} only have two optional arguments, "host" and 
"webui-port". Actually, the "webui-port" could not work unless we also specify 
the "host".

 

I suggest to remove the two optional arguments and use common args instead, 
just like {{standalone-job.sh}}. After then we could support more options, for 
example, "--host host1", "--webui-port 8081", "-Dxx=yy". It is more flexible 
and easier to use.

 

Before:
{code:java}
Usage: jobmanager.sh ((start|start-foreground) [host] 
[webui-port])|stop|stop-all
{code}
After:
{code:java}
Usage: jobmanager.sh ((start|start-foreground) [args])|stop|stop-all
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21618) Introduce a new integration test framework for SQL Client

2021-03-04 Thread Jark Wu (Jira)
Jark Wu created FLINK-21618:
---

 Summary: Introduce a new integration test framework for SQL Client
 Key: FLINK-21618
 URL: https://issues.apache.org/jira/browse/FLINK-21618
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Reporter: Jark Wu
Assignee: Jark Wu
 Fix For: 1.13.0


Currently, adding test for a new feature in SQL Client is difficult. There is 
no clear clue where to add tests. Besides, all the tests in SQL Client module 
is somewhat unit test, there is no integration tests. That's why we can see 
many little bugs in this module. 

An end-to-end component path of SQL Client is:
SqlClient => CliClient => parse sql => invoke 

For example, {{CliClientTest}} only tests Sql Parser in CliClient, and 
{{LocalExecutorITCase}} only tests {{LocalExecutor}}. Therefore, this issue 
aims to resolve this problem, introduce a new integration framework to test 
end-to-end of SQL Client. 






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21617) FLIP-162: Consistent Flink SQL time function behavior

2021-03-04 Thread Leonard Xu (Jira)
Leonard Xu created FLINK-21617:
--

 Summary: FLIP-162: Consistent Flink SQL time function behavior
 Key: FLINK-21617
 URL: https://issues.apache.org/jira/browse/FLINK-21617
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API
Reporter: Leonard Xu


https://cwiki.apache.org/confluence/display/FLINK/FLIP-162%3A+Consistent+Flink+SQL+time+function+behavior



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21616) Introduce a new integration test framework for SQL Client

2021-03-04 Thread Jark Wu (Jira)
Jark Wu created FLINK-21616:
---

 Summary: Introduce a new integration test framework for SQL Client
 Key: FLINK-21616
 URL: https://issues.apache.org/jira/browse/FLINK-21616
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Reporter: Jark Wu
Assignee: Jark Wu
 Fix For: 1.13.0


Currently, adding test for a new feature in SQL Client is difficult. There is 
no clear clue where to add tests. Besides, all the tests in SQL Client module 
is somewhat unit test, there is no integration tests. That's why we can see 
many little bugs in this module. 

An end-to-end component path of SQL Client is:
SqlClient => CliClient => parse sql => invoke 

For example, {{CliClientTest}} only tests Sql Parser in CliClient, and 
{{LocalExecutorITCase}} only tests {{LocalExecutor}}. Therefore, this issue 
aims to resolve this problem, introduce a new integration framework to test 
end-to-end of SQL Client. 






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[RESULT][VOTE] FLIP-162: Consistent Flink SQL time function behavior

2021-03-04 Thread Leonard Xu
Hi all,

The voting time for FLIP-162 has passed. I'm closing the vote now.

There were 4 +1 votes, 4 of which are binding:
- Jark (binding)
- Timo (binding)
- Kurt (binding)
- Jingsong (binding)

There were no -1 votes, thus, FLIP-162 has been accepted.

Thanks everyone!

Best,
Leonard


[jira] [Created] (FLINK-21614) Introduce a new integration test framework for SQL Client

2021-03-04 Thread Jark Wu (Jira)
Jark Wu created FLINK-21614:
---

 Summary: Introduce a new integration test framework for SQL Client
 Key: FLINK-21614
 URL: https://issues.apache.org/jira/browse/FLINK-21614
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Reporter: Jark Wu
Assignee: Jark Wu
 Fix For: 1.13.0


Currently, adding test for a new feature in SQL Client is difficult. There is 
no clear clue where to add tests. Besides, all the tests in SQL Client module 
is somewhat unit test, there is no integration tests. That's why we can see 
many little bugs in this module. 

An end-to-end component path of SQL Client is:
SqlClient => CliClient => parse sql => invoke 

For example, {{CliClientTest}} only tests Sql Parser in CliClient, and 
{{LocalExecutorITCase}} only tests {{LocalExecutor}}. Therefore, this issue 
aims to resolve this problem, introduce a new integration framework to test 
end-to-end of SQL Client. 






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21615) Introduce a new integration test framework for SQL Client

2021-03-04 Thread Jark Wu (Jira)
Jark Wu created FLINK-21615:
---

 Summary: Introduce a new integration test framework for SQL Client
 Key: FLINK-21615
 URL: https://issues.apache.org/jira/browse/FLINK-21615
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Reporter: Jark Wu
Assignee: Jark Wu
 Fix For: 1.13.0


Currently, adding test for a new feature in SQL Client is difficult. There is 
no clear clue where to add tests. Besides, all the tests in SQL Client module 
is somewhat unit test, there is no integration tests. That's why we can see 
many little bugs in this module. 

An end-to-end component path of SQL Client is:
SqlClient => CliClient => parse sql => invoke 

For example, {{CliClientTest}} only tests Sql Parser in CliClient, and 
{{LocalExecutorITCase}} only tests {{LocalExecutor}}. Therefore, this issue 
aims to resolve this problem, introduce a new integration framework to test 
end-to-end of SQL Client. 






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[DISCUSSION] Introduce a separated memory pool for the TM merge shuffle

2021-03-04 Thread Guowei Ma
Hi, all


In the Flink 1.12 we introduce the TM merge shuffle. But the out-of-the-box
experience of using TM merge shuffle is not very good. The main reason is
that the default configuration always makes users encounter OOM [1]. So we
hope to introduce a managed memory pool for TM merge shuffle to avoid the
problem.
Goals

   1. Don't affect the streaming and pipelined-shuffle-only batch setups.
   2. Don't mix memory with different life cycle in the same pool. E.g.,
   write buffers needed by running tasks and read buffer needed even after
   tasks being finished.
   3. User can use the TM merge shuffle with default memory configurations.
   (May need further tunings for performance optimization, but should not fail
   with the default configurations.)

Proposal

   1. Introduce a configuration `taskmanager.memory.network.batch-read` to
   specify the size of this memory pool. The default value is 16m.
   2. Allocate the pool lazily. It means that the memory pool would be
   allocated when the TM merge shuffle is used at the first time.
   3. This pool size will not be add up to the TM's total memory size, but
   will be considered part of `taskmanager.memory.framework.off-heap.size`. We
   need to check that the pool size is not larger than the framework off-heap
   size, if TM merge shuffle is enabled.


In this default configuration, the allocation of the memory pool is almost
impossible to fail. Currently the default framework’s off-heap memory is
128m, which is mainly used by Netty. But after we introduced zero copy, the
usage of it has been reduced, and you can refer to the detailed data [2].
Known Limitation
Usability for increasing the memory pool size

In addition to increasing `taskmanager.memory.network.batch-read`, the user
may also need to adjust `taskmanager.memory.framework.off-heap.size` at the
same time. It also means that once the user forgets this, it is likely to
fail the check when allocating the memory pool.


So in the following two situations, we will still prompt the user to
increase the size of `framework.off-heap.size`.

   1. `taskmanager.memory.network.batch-read` is bigger than
   `taskmanager.memory.framework.off-heap.size`
   2. Allocating the pool encounters the OOM.


An alternative is that when the user adjusts the size of the memory pool,
the system automatically adjusts it. But we are not entierly sure about
this, given its implicity and complicating the memory configurations.
Potential memory waste

In the first step, the memory pool will not be released once allocated. This
means in the first step, even if there is no subsequent batch job, the
pooled memory cannot be used by other consumers.


We are not releasing the pool in the first step due to the concern that
frequently allocating/deallocating the entire pool may increase the GC
pressue. Investitations on how to dynamically release the pool when it's no
longer needed is considered a future follow-up.


Looking forward to your feedback.



[1] https://issues.apache.org/jira/browse/FLINK-20740

[2] https://github.com/apache/flink/pull/7368.
Best,
Guowei


[jira] [Created] (FLINK-21613) Parse Compute Column with `IN` expression throws NPE

2021-03-04 Thread Shuo Cheng (Jira)
Shuo Cheng created FLINK-21613:
--

 Summary: Parse Compute Column with `IN` expression throws NPE
 Key: FLINK-21613
 URL: https://issues.apache.org/jira/browse/FLINK-21613
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API
Affects Versions: 1.13.0
Reporter: Shuo Cheng


Considering the following given sql:

{code:sql}
CREATE TABLE MyInputFormatTable (
  `a` INT,
  `b` BIGINT,
  `c` STRING,
  `d` as `c` IN ('Hi', 'Hello')
) WITH (
  'connector' = 'values',
  'data-id' = '$dataId',
  'runtime-source' = 'InputFormat'
)
{code}

NPE will be thrown during parsing the sql: 
`select * from MyInputFormatTable`

It seems it's the commit "[hotfix][table-planner-blink] Simplify SQL expression 
to RexNode conversion" which introduces this problem. This hotfix uses a method 
`SqlToRelConverter#convertExpression` and this method does not has any tests 
and is not used in Calcite anywhere, which is unsafe. Maybe reverting the 
hotfix is a good choice.

CC [~twalthr]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21612) Support StreamExecGroupAggregate json serialization/deserialization

2021-03-04 Thread godfrey he (Jira)
godfrey he created FLINK-21612:
--

 Summary: Support StreamExecGroupAggregate json 
serialization/deserialization
 Key: FLINK-21612
 URL: https://issues.apache.org/jira/browse/FLINK-21612
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: godfrey he
Assignee: godfrey he
 Fix For: 1.13.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-165: Operator's Flame Graphs

2021-03-04 Thread Alexander Fedulov
@Till, I've added the proposed ThreadInfoSamplesRequest and updated the
FLIP and the PR accordingly.

Best,

--

Alexander Fedulov | Solutions Architect



Follow us @VervericaData

--

Join Flink Forward  - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Yip Park Tung Jason, Jinwei (Kevin) Zhang, Karl Anton
Wehner



On Wed, Mar 3, 2021 at 5:03 PM Alexander Fedulov 
wrote:

> Added docs to the PR.
> @David, thanks for the tip, it seems like a good place to put them.
>
> --
>
> Alexander Fedulov | Solutions Architect
>
> 
>
> Follow us @VervericaData
>
>
>
>
> On Wed, Mar 3, 2021 at 12:10 PM David Anderson 
> wrote:
>
>> This is going to make performance analysis and optimization much more
>> accessible. I can't wait to include this in our training courses.
>>
>> +1
>>
>> Seth suggested putting the docs for this feature under
>> Operations/Monitoring, but there's already a page in the docs under
>> Operations/Debugging for Application Profiling & Debugging, which is more
>> on point. I think it will be confusing if the flame graphs aren't
>> together with the other profilers.
>>
>> David
>>
>> On Tue, Mar 2, 2021 at 11:36 PM Seth Wiesman  wrote:
>>
>> > Cool feature +1
>> >
>> > There is a subsection called monitoring in the operations section of the
>> > docs. It would fit nicely there.
>> >
>> > Seth
>> >
>> > On Tue, Mar 2, 2021 at 4:23 PM Alexander Fedulov <
>> alexan...@ververica.com>
>> > wrote:
>> >
>> > > Hi Piotr,
>> > >
>> > > Thanks for the comments - all valid points.
>> > > We should definitely document how the Flame Graphs are constructed - I
>> > will
>> > > work on the docs. Do you have a proposition about the part of which
>> > > page/section they should become?
>> > > I would like to also mention here that I plan to work on further
>> > > improvements, such as the ability to "zoom in" into the Flame Graphs
>> for
>> > > the individual Tasks in the "unsquashed" form,  so some of those
>> concerns
>> > > should be mitigated in the future.
>> > >
>> > > Best,
>> > >
>> > > --
>> > >
>> > > Alexander Fedulov | Solutions Architect
>> > >
>> > > 
>> > >
>> > > Follow us @VervericaData
>> > >
>> > >
>> > > On Tue, Mar 2, 2021 at 3:17 PM Piotr Nowojski 
>> > > wrote:
>> > >
>> > > > Nice feature +1 from my side for it.
>> > > >
>> > > > In the PR I think we are missing documentation. I think it's
>> especially
>> > > > important to mention the limitations of this approach for
>> performance
>> > > > analysis. If we make it easy for the user to get such kind of data,
>> > it's
>> > > > important they do not proverbially shoot themselves in their own
>> foot
>> > > with
>> > > > false conclusions. We should clearly mention how those data are
>> > sampled,
>> > > > and point to limitations such as:
>> > > > - data from all threads/operators are squashed together, so if there
>> > is a
>> > > > data skew it will be averaged out
>> > > > - stack sampling is/can be biased (JVM threads are more likely to be
>> > > > stopped in some places than others, while skipping/rarely stopping
>> in
>> > the
>> > > > true hot spots - so one should treat the results with a grain of
>> salt
>> > > below
>> > > > a certain level)
>> > > > - true bottleneck might be obscured by the fact flame graphs are
>> > > squashing
>> > > > results from all of the threads. There can be 60% of time spent in
>> one
>> > > > component according to a flame graph, but it might not necessarily
>> be
>> > the
>> > > > bottleneck, which could be in a completely different component
>> running
>> > > > which has a single thread burning 100% of a single CPU core, barely
>> > > visible
>> > > > in the flame graph at all.
>> > > >
>> > > > It's great to have such a nice tool readily and easily available,
>> but
>> > we
>> > > > need to make sure people who are using it are aware when it can be
>> > > > misleading.
>> > > >
>> > > > Piotrek
>> > > >
>> > > > wt., 2 mar 2021 o 15:12 Till Rohrmann 
>> > napisał(a):
>> > > >
>> > > > > Ah ok. Thanks for the clarification Alex.
>> > > > >
>> > > > > Cheers,
>> > > > > Till
>> > > > >
>> > > > > On Tue, Mar 2, 2021 at 2:02 PM Alexander Fedulov <
>> > > > alexan...@ververica.com>
>> > > > > wrote:
>> > > > >
>> > > > > > It is passed back as part of the response to the asynchronous
>> > > callback
>> > > > > > within the coordinator and is used to decide if all outstanding
>> > > > requests
>> > > > > to
>> > > > > > the parallel instances of a particular operator returned
>> > > successfully.
>> > > > If
>> > > > > > so, the request is considered successful, sub-results are
>> combined
>> > > and
>> > > > > the
>> > > > > > thread info result future for an operator completes.
>> > > > > >
>> > > > > >
>> 

[jira] [Created] (FLINK-21611) Japicmp fails with "Could not resolve org.apache.flink:flink-core:jar:1.12.0"

2021-03-04 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-21611:


 Summary: Japicmp fails with "Could not resolve 
org.apache.flink:flink-core:jar:1.12.0"
 Key: FLINK-21611
 URL: https://issues.apache.org/jira/browse/FLINK-21611
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 1.13.0
Reporter: Dawid Wysakowicz


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=14139=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=9b1a0f88-517b-5893-fc93-76f4670982b4

{code}
[ERROR] Failed to execute goal 
com.github.siom79.japicmp:japicmp-maven-plugin:0.11.0:cmp (default) on project 
flink-core: Could not resolve org.apache.flink:flink-core:jar:1.12.0 -> [Help 1]
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21610) ProcessFailureCancelingITCase failed on azure

2021-03-04 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-21610:


 Summary: ProcessFailureCancelingITCase failed on azure
 Key: FLINK-21610
 URL: https://issues.apache.org/jira/browse/FLINK-21610
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.13.0
Reporter: Dawid Wysakowicz


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=14115=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=f508e270-48d6-5f1e-3138-42a17e0714f0

{code}
2021-03-04T11:38:44.9321152Z [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
Skipped: 0, Time elapsed: 16.463 s <<< FAILURE! - in 
org.apache.flink.test.recovery.ProcessFailureCancelingITCase
2021-03-04T11:38:44.9323887Z [ERROR] 
testCancelingOnProcessFailure(org.apache.flink.test.recovery.ProcessFailureCancelingITCase)
  Time elapsed: 15.692 s  <<< ERROR!
2021-03-04T11:38:44.9334234Z java.lang.NullPointerException
2021-03-04T11:38:44.9335222Zat 
org.apache.flink.test.recovery.ProcessFailureCancelingITCase.testCancelingOnProcessFailure(ProcessFailureCancelingITCase.java:247)
2021-03-04T11:38:44.9335994Zat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2021-03-04T11:38:44.9336640Zat 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2021-03-04T11:38:44.9337423Zat 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2021-03-04T11:38:44.9341553Zat 
java.lang.reflect.Method.invoke(Method.java:498)
2021-03-04T11:38:44.9342587Zat 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
2021-03-04T11:38:44.9343392Zat 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
2021-03-04T11:38:44.9343896Zat 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
2021-03-04T11:38:44.9344396Zat 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
2021-03-04T11:38:44.9345092Zat 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
2021-03-04T11:38:44.9345529Zat 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
2021-03-04T11:38:44.9345954Zat 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
2021-03-04T11:38:44.9346374Zat 
org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
2021-03-04T11:38:44.9346896Zat 
org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
2021-03-04T11:38:44.9347636Zat 
org.junit.rules.RunRules.evaluate(RunRules.java:20)
2021-03-04T11:38:44.9350380Zat 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
2021-03-04T11:38:44.9351222Zat 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
2021-03-04T11:38:44.9352002Zat 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
2021-03-04T11:38:44.9353195Zat 
org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
2021-03-04T11:38:44.9353879Zat 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
2021-03-04T11:38:44.9354538Zat 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
2021-03-04T11:38:44.9355220Zat 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
2021-03-04T11:38:44.9355890Zat 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
2021-03-04T11:38:44.9356548Zat 
org.junit.runners.ParentRunner.run(ParentRunner.java:363)
2021-03-04T11:38:44.9357252Zat 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
2021-03-04T11:38:44.9358062Zat 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
2021-03-04T11:38:44.9358869Zat 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
2021-03-04T11:38:44.9359742Zat 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
2021-03-04T11:38:44.9360593Zat 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
2021-03-04T11:38:44.9361416Zat 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
2021-03-04T11:38:44.9362156Zat 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
2021-03-04T11:38:44.9362977Zat 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21609) SimpleRecoveryITCaseBase.testRestartMultipleTimes fails on azure

2021-03-04 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-21609:


 Summary: SimpleRecoveryITCaseBase.testRestartMultipleTimes fails 
on azure
 Key: FLINK-21609
 URL: https://issues.apache.org/jira/browse/FLINK-21609
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.13.0
Reporter: Dawid Wysakowicz


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=14115=logs=5c8e7682-d68f-54d1-16a2-a09310218a49=f508e270-48d6-5f1e-3138-42a17e0714f0

{code}
2021-03-04T11:39:35.8958609Z [ERROR] 
testRestartMultipleTimes(org.apache.flink.test.recovery.SimpleRecoveryExponentialDelayRestartStrategyITBase)
  Time elapsed: 1.753 s  <<< FAILURE!
2021-03-04T11:39:35.8959196Z java.lang.AssertionError: expected:<55> but 
was:<143>
2021-03-04T11:39:35.8959667Zat org.junit.Assert.fail(Assert.java:88)
2021-03-04T11:39:35.8959989Zat 
org.junit.Assert.failNotEquals(Assert.java:834)
2021-03-04T11:39:35.8962924Zat 
org.junit.Assert.assertEquals(Assert.java:645)
2021-03-04T11:39:35.8964175Zat 
org.junit.Assert.assertEquals(Assert.java:631)
2021-03-04T11:39:35.8964980Zat 
org.apache.flink.test.recovery.SimpleRecoveryITCaseBase.testRestartMultipleTimes(SimpleRecoveryITCaseBase.java:165)
2021-03-04T11:39:35.8965750Zat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2021-03-04T11:39:35.8966503Zat 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2021-03-04T11:39:35.8967221Zat 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2021-03-04T11:39:35.8967902Zat 
java.lang.reflect.Method.invoke(Method.java:498)
2021-03-04T11:39:35.8968526Zat 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
2021-03-04T11:39:35.8969842Zat 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
2021-03-04T11:39:35.8970492Zat 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
2021-03-04T11:39:35.8971209Zat 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
2021-03-04T11:39:35.8971878Zat 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
2021-03-04T11:39:35.8972603Zat 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
2021-03-04T11:39:35.8973268Zat 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
2021-03-04T11:39:35.8973874Zat 
org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
2021-03-04T11:39:35.8974416Zat 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
2021-03-04T11:39:35.8987184Zat 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
2021-03-04T11:39:35.8987899Zat 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
2021-03-04T11:39:35.8988531Zat 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
2021-03-04T11:39:35.8989129Zat 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
2021-03-04T11:39:35.8989801Zat 
org.junit.rules.RunRules.evaluate(RunRules.java:20)
2021-03-04T11:39:35.8990307Zat 
org.junit.runners.ParentRunner.run(ParentRunner.java:363)
2021-03-04T11:39:35.8991100Zat 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
2021-03-04T11:39:35.8991907Zat 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
2021-03-04T11:39:35.8993251Zat 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
2021-03-04T11:39:35.8993978Zat 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
2021-03-04T11:39:35.8994705Zat 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
2021-03-04T11:39:35.8995465Zat 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
2021-03-04T11:39:35.8996117Zat 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
2021-03-04T11:39:35.8996759Zat 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21608) YARNHighAvailabilityITCase.testClusterClientRetrieval fails with "There is at least one application..."

2021-03-04 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-21608:


 Summary: YARNHighAvailabilityITCase.testClusterClientRetrieval 
fails with "There is at least one application..."
 Key: FLINK-21608
 URL: https://issues.apache.org/jira/browse/FLINK-21608
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN, Tests
Affects Versions: 1.13.0
Reporter: Dawid Wysakowicz


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=14108=logs=fc5181b0-e452-5c8f-68de-1097947f6483=62110053-334f-5295-a0ab-80dd7e2babbf

{code}
2021-03-04T10:45:47.3523930Z INFO: Binding 
org.apache.hadoop.yarn.webapp.GenericExceptionHandler to 
GuiceManagedComponentProvider with the scope "Singleton"
2021-03-04T10:45:47.4240091Z Mar 04, 2021 10:45:47 AM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory 
getComponentProvider
2021-03-04T10:45:47.4241009Z INFO: Binding 
org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebServices to 
GuiceManagedComponentProvider with the scope "Singleton"
2021-03-04T10:47:53.6102867Z [ERROR] Tests run: 3, Failures: 1, Errors: 0, 
Skipped: 0, Time elapsed: 132.302 s <<< FAILURE! - in 
org.apache.flink.yarn.YARNHighAvailabilityITCase
2021-03-04T10:47:53.6103745Z [ERROR] 
testClusterClientRetrieval(org.apache.flink.yarn.YARNHighAvailabilityITCase)  
Time elapsed: 15.906 s  <<< FAILURE!
2021-03-04T10:47:53.6104784Z java.lang.AssertionError: There is at least one 
application on the cluster that is not finished.[App 
application_1614854744820_0003 is in state RUNNING.]
2021-03-04T10:47:53.6106075Zat org.junit.Assert.fail(Assert.java:88)
2021-03-04T10:47:53.6108977Zat 
org.apache.flink.yarn.YarnTestBase$CleanupYarnApplication.close(YarnTestBase.java:322)
2021-03-04T10:47:53.6109784Zat 
org.apache.flink.yarn.YarnTestBase.runTest(YarnTestBase.java:286)
2021-03-04T10:47:53.6110493Zat 
org.apache.flink.yarn.YARNHighAvailabilityITCase.testClusterClientRetrieval(YARNHighAvailabilityITCase.java:219)
2021-03-04T10:47:53.6111446Zat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2021-03-04T10:47:53.6111871Zat 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2021-03-04T10:47:53.6112360Zat 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2021-03-04T10:47:53.6112784Zat 
java.lang.reflect.Method.invoke(Method.java:498)
2021-03-04T10:47:53.6113210Zat 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
2021-03-04T10:47:53.6114001Zat 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
2021-03-04T10:47:53.6114796Zat 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
2021-03-04T10:47:53.6115388Zat 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
2021-03-04T10:47:53.6116123Zat 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
2021-03-04T10:47:53.6116995Zat 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
2021-03-04T10:47:53.6117810Zat 
org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
2021-03-04T10:47:53.6118621Zat 
org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
2021-03-04T10:47:53.6119311Zat 
org.junit.rules.RunRules.evaluate(RunRules.java:20)
2021-03-04T10:47:53.6119840Zat 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
2021-03-04T10:47:53.6120279Zat 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
2021-03-04T10:47:53.6120739Zat 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
2021-03-04T10:47:53.6121173Zat 
org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
2021-03-04T10:47:53.6121692Zat 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
2021-03-04T10:47:53.6122128Zat 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
2021-03-04T10:47:53.6122594Zat 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
2021-03-04T10:47:53.6123005Zat 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
2021-03-04T10:47:53.6123432Zat 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
2021-03-04T10:47:53.6123978Zat 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
2021-03-04T10:47:53.6124425Zat 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
2021-03-04T10:47:53.6124839Zat 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
2021-03-04T10:47:53.6125264Zat 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
2021-03-04T10:47:53.6125657Zat 
org.junit.rules.RunRules.evaluate(RunRules.java:20)
2021-03-04T10:47:53.6126017Z

Re: [VOTE] FLIP-151: Incremental snapshots for heap-based state backend

2021-03-04 Thread Yu Li
+1 (binding)

The latest FLIP document LGTM. Thanks for driving this Roman!

Best Regards,
Yu


On Thu, 4 Mar 2021 at 18:24, David Anderson  wrote:

> +1 (non-binding)
>
> On Mon, Mar 1, 2021 at 10:12 AM Roman Khachatryan 
> wrote:
>
> > Hi everyone,
> >
> > since the discussion [1] about FLIP-151 [2] seems to have reached a
> > consensus, I'd like to start a formal vote for the FLIP.
> >
> > Please vote +1 to approve the FLIP, or -1 with a comment. The vote will
> be
> > open at least until Wednesday, Mar 3rd.
> >
> > [1] https://s.apache.org/flip-151-discussion
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-151%3A+Incremental+snapshots+for+heap-based+state+backend
> >
> > Regards,
> > Roman
> >
>


[jira] [Created] (FLINK-21607) Twitter table source connector

2021-03-04 Thread Jeff Zhang (Jira)
Jeff Zhang created FLINK-21607:
--

 Summary: Twitter table source connector
 Key: FLINK-21607
 URL: https://issues.apache.org/jira/browse/FLINK-21607
 Project: Flink
  Issue Type: New Feature
Affects Versions: 1.12.2
Reporter: Jeff Zhang


It would be nice to have such flink twitter table source connector. This is 
especially useful to demo flink sql examples. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21606) TaskManager connected to invalid JobManager leading to TaskSubmissionException

2021-03-04 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-21606:
--

 Summary: TaskManager connected to invalid JobManager leading to 
TaskSubmissionException
 Key: FLINK-21606
 URL: https://issues.apache.org/jira/browse/FLINK-21606
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.13.0
Reporter: Robert Metzger
 Fix For: 1.13.0


While testing reactive mode, I had to start my JobManager a few times to get 
the configuration right. While doing that, I had at least on TaskManager (TM6), 
which was first connected to the first JobManager (with a running job), and 
then to the second one.

On the second JobManager, I was able to execute my test job (on another 
TaskManager (TMx)), once TM6 reconnected, and reactive mode tried to utilize 
all available resources, I repeatedly ran into this issue:

{code}
2021-03-04 15:49:36,322 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - 
Window(GlobalWindows(), DeltaTrigger, TimeEvictor, ComparableAggregator, 
PassThroughWindowFunction) -> Sink: Print to Std. Out (5/7) 
(ae8f39c8dd88148aff93c8f811fab22e) switched from DEPLOYING to FAILED on 
192.168.2.173:64041-4f7521 @ macbook-pro-2.localdomain (dataPort=64044).
java.util.concurrent.CompletionException: 
org.apache.flink.runtime.taskexecutor.exceptions.TaskSubmissionException: Could 
not submit task because there is no JobManager associated for the job 
bbe8634736b5b1d813dd322cfaaa08ea.
at 
java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326) 
~[?:1.8.0_252]
at 
java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
 ~[?:1.8.0_252]
at 
java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:925) 
~[?:1.8.0_252]
at 
java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:913)
 ~[?:1.8.0_252]
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_252]
at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_252]
at 
org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:234)
 ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
 ~[?:1.8.0_252]
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 ~[?:1.8.0_252]
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_252]
at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_252]
at 
org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:1064)
 ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
at akka.dispatch.OnComplete.internal(Future.scala:263) 
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
at akka.dispatch.OnComplete.internal(Future.scala:261) 
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:191) 
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:188) 
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) 
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
at 
org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:73)
 ~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
at 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44) 
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
at 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252) 
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:572) 
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:101) 
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
at 
akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:999) 
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
at akka.actor.Actor$class.aroundReceive(Actor.scala:517) 
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:458) 
~[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) 
[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
at akka.actor.ActorCell.invoke(ActorCell.scala:561) 
[flink-dist_2.11-1.13-SNAPSHOT.jar:1.13-SNAPSHOT]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) 

Re: [DISCUSS] Deprecation and removal of the legacy SQL planner

2021-03-04 Thread Jark Wu
big +1 from my side.

Best,
Jark

On Thu, 4 Mar 2021 at 20:59, Leonard Xu  wrote:

> +1 for the roadmap.
>
> Thanks Timo for driving this.
>
> Best,
> Leonard
>
> > 在 2021年3月4日,20:40,Timo Walther  写道:
> >
> > Last call for feedback on this topic.
> >
> > It seems everyone agrees to finally complete FLIP-32. Since FLIP-32 has
> been accepted for a very long time, I think we don't need another voting
> thread for executing the last implementation step. Please let me know if
> you think differently.
> >
> > I will start deprecating the affected classes and interfaces beginning
> of next week.
> >
> > Regards,
> > Timo
> >
> >
> > On 26.02.21 15:46, Seth Wiesman wrote:
> >> Strong +1
> >> Having two planners is confusing to users and the diverging semantics
> make
> >> it difficult to provide useful learning material. It is time to rip the
> >> bandage off.
> >> Seth
> >> On Fri, Feb 26, 2021 at 12:54 AM Kurt Young  wrote:
> >>>  breaking
> >>> change.>
> >>>
> >>> Hi Timo,
> >>>
> >>> First of all I want to thank you for introducing this planner design
> back
> >>> in 1.9, this is a great work
> >>> that allows lots of blink features to be merged to Flink in a
> reasonably
> >>> short time. It greatly
> >>> accelerates the evolution speed of Table & SQL.
> >>>
> >>> Everything comes with a cost, as you said, right now we are facing the
> >>> overhead of maintaining
> >>> two planners and it causes bugs and also increases imbalance between
> these
> >>> two planners. As
> >>> a developer and also for the good of all Table & SQL users, I also
> think
> >>> it's better for us to be more
> >>> focused on a single planner.
> >>>
> >>> Your proposed roadmap looks good to me, +1 from my side and thanks
> >>> again for all your efforts!
> >>>
> >>> Best,
> >>> Kurt
> >>>
> >>>
> >>> On Thu, Feb 25, 2021 at 5:01 PM Timo Walther 
> wrote:
> >>>
>  Hi everyone,
> 
>  since Flink 1.9 we have supported two SQL planners. Most of the
> original
>  plan of FLIP-32 [1] has been implemented. The Blink code merge has
> been
>  completed and many additional features have been added exclusively to
>  the new planner. The new planner is now in a much better shape than
> the
>  legacy one.
> 
>  In order to avoid user confusion, reduce duplicate code, and improve
>  maintainability and testing times of the Flink project as a whole we
>  would like to propose the following steps to complete FLIP-32:
> 
>  In Flink 1.13:
>  - Deprecate the `flink-table-planner` module
>  - Deprecate `BatchTableEnvironment` for both Java, Scala, and Python
> 
>  In Flink 1.14:
>  - Drop `flink-table-planner` early
>  - Drop many deprecated interfaces and API on demand
>  - Rename `flink-table-planner-blink` to `flink-table-planner`
>  - Rename `flink-table-runtime-blink` to `flink-table-runtime`
>  - Remove references of "Blink" in the code base
> 
>  This will have an impact on users that still use DataSet API together
>  with Table API. With this change we will not support converting
> between
>  DataSet API and Table API anymore. We hope to compensate the missing
>  functionality in the new unified TableEnvironment and/or the batch
> mode
>  in DataStream API during 1.14 and 1.15. For this, we are looking for
>  further feedback which features are required in Table API/DataStream
> API
>  to have a smooth migration path.
> 
>  Looking forward to your feedback.
> 
>  Regards,
>  Timo
> 
>  [1]
> 
> 
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions
> 
> >>>
> >
>
>


Re: Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-03-04 Thread Piotr Nowojski
Hi Yun and Kezhu,

> 1. We might introduce a new type of event to notify the endOfInput()
though the graph first, and then A/B waits for the final
checkpoint, then A emit EndOfPartitionEvent to exit all the tasks as now.

As I mentioned in one of the PRs, I would opt for this solution.

>  if we go towards 1, the tasks would still need to exit from the source,
and if we go towards 2/3, we could be able to allow these
tasks to finish first.

Is this a problem? Pipeline would be empty, so EndOfPartitionEvent would be
traveling very quickly.

> should we also need to do it for normal exit

Yes, but aren't we doing it right now anyway?
`StreamSource#advanceToEndOfEventTime`?

> If so, since now for recovery after some tasks finished we would first
start all the tasks and stop the finished tasks directly

Is this the plan? That upon recovery we are restarting all operators, even
those that have already finished? Certainly it's one of the possibilities.

> For example, the global committer handler might be write meta store for
FileSystem/Hive sinks, and these should happen after all the pending
> files are renamed to the final ones, otherwise the downstream jobs might
miss some files if they relies on the meta store to identify ready
partitions.
> Thus we would have to emit the global-committer-handler after
notifyCheckpointComplete. But since we could be able to know the list of
files
> to rename in snapshotState(), we could create the
global-committer-handler and store them there.

Keep in mind that those are two separate things, as I mentioned in a
previous e-mail:
> II. When should the `GlobalCommitHandle` be created? Should it be
returned from `snapshotState()`, `notifyCheckpointComplete()` or somewhere
else?
> III. What should be the ordering guarantee between global commit and
local commit, if any? Actually the easiest to implement would be undefined,
but de facto global commit happening before local commits (first invoke
`notifyCheckpointComplete()` on the `OperatorCoordinator` and either after
or in parallel send `notifyCheckpointComplete()` RPCs). As far as I can
tell, undefined order should work for the use cases that I'm aware of.

We could create the `GlobalCommitHandle` in
`StreamOperator#snapshotState()`, while we could also ensure that
`notifyCheckpointComplete()` is called on the `OperatorCoordinator` AFTER
all of the operators have successfully processed
`notifyCheckpointComplete()`. This would be more difficult to implement,
hence I would prefer "undefined" behaviour here, but it's probably possible.

Kezhu:

> Is `endOfInput` in your proposal same as `BoundedOneInput.endInput` ?

Yes it's the same. Sorry for a typo, somehow I was convinced its
`endOfInput` not simple `endInput` :)

Piotrek

czw., 4 mar 2021 o 11:09 Kezhu Wang  napisał(a):

> Hi Piotrek,
>
> For “end-flushing”, I presented no new candidates. I uses “end-flushing”
> to avoid naming issues. It should be “close” in allowing checkpoint after
> “close” while “endOfInput” in your approach. What I tried to express is we
> can do some backporting work to mitigate possible breaking changes in your
> approach.
>
> To confirm to myself that whether I understand your approach correctly, I
> want to confirm somethings to avoid misleading from my side:
> * Is `endOfInput` in your proposal same as `BoundedOneInput.endInput` ?
>
> I (also ?) think there are overlaps between `BoundedOneInput.endInput` and
> `close`. If yes, I like your proposal to squash them to one and deprecate
> `BoundedOneInput.endInput`.
>
> > Also, `global-commit-handles` will need to be part of the checkpoint.
>
> I had to admit that I overlooked this, sorry for this. In my previous
> suggestion, I assumed `global-commit-handles` is not part of checkpoint.
> Checkpoint will complete and persist as before. The changed parts are after
> checkpoint considered completed.
>
>
> Best,
> Kezhu Wang
>
>
> On March 4, 2021 at 17:15:53, Piotr Nowojski (pnowoj...@apache.org) wrote:
>
> Hi Kezhu,
>
> What do you mean by “end-flushing”? I was suggesting to just keep
> `endOfInput()` and `dispose()`. Are you suggesting to have a one
> `endFlushing()` method, that is called after quiescing timers/mailbox, but
> before final checkpoint and `dispose()`? Are we sure we really need this
> extra call? Note. If we don't need it at the moment, we could always
> introduce it in the future, while if we don't and won't need it, why
> complicate the API?
>
> About the idea of returning the global-commit-handle from
> `notifyCheckpointComplete()` call. Note it will be more difficult to
> implement, as `CheckpointCoordinator` will need to have one extra stage of
> waiting for some actions to complete. Implementation will probably be
> easier if we return the global-commit-handle from `snapshotState()` call.
>
> Also, `global-commit-handles` will need to be part of the checkpoint. They
> will need to be restored/recovered in case of failure. Because of that it
> might be actually impossible to 

[jira] [Created] (FLINK-21605) Uniform the format of property name in the flink configuration file.

2021-03-04 Thread Jira
张超明 created FLINK-21605:
---

 Summary: Uniform the format of property name in the flink 
configuration file.
 Key: FLINK-21605
 URL: https://issues.apache.org/jira/browse/FLINK-21605
 Project: Flink
  Issue Type: Improvement
Reporter: 张超明


h2. Now, there are several kinds of format in the flink configuration file.
 * Some are in camel-case:
{code:yaml}
high-availability.storageDir: hdfs:///flink/ha/
{code}
* Some are with separator:
{code:yaml}
historyserver.archive.fs.refresh-interval: 1
{code}

It's advisable for us to uniform their formats in next release version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Deprecation and removal of the legacy SQL planner

2021-03-04 Thread Leonard Xu
+1 for the roadmap.

Thanks Timo for driving this.

Best,
Leonard

> 在 2021年3月4日,20:40,Timo Walther  写道:
> 
> Last call for feedback on this topic.
> 
> It seems everyone agrees to finally complete FLIP-32. Since FLIP-32 has been 
> accepted for a very long time, I think we don't need another voting thread 
> for executing the last implementation step. Please let me know if you think 
> differently.
> 
> I will start deprecating the affected classes and interfaces beginning of 
> next week.
> 
> Regards,
> Timo
> 
> 
> On 26.02.21 15:46, Seth Wiesman wrote:
>> Strong +1
>> Having two planners is confusing to users and the diverging semantics make
>> it difficult to provide useful learning material. It is time to rip the
>> bandage off.
>> Seth
>> On Fri, Feb 26, 2021 at 12:54 AM Kurt Young  wrote:
>>> >> change.>
>>> 
>>> Hi Timo,
>>> 
>>> First of all I want to thank you for introducing this planner design back
>>> in 1.9, this is a great work
>>> that allows lots of blink features to be merged to Flink in a reasonably
>>> short time. It greatly
>>> accelerates the evolution speed of Table & SQL.
>>> 
>>> Everything comes with a cost, as you said, right now we are facing the
>>> overhead of maintaining
>>> two planners and it causes bugs and also increases imbalance between these
>>> two planners. As
>>> a developer and also for the good of all Table & SQL users, I also think
>>> it's better for us to be more
>>> focused on a single planner.
>>> 
>>> Your proposed roadmap looks good to me, +1 from my side and thanks
>>> again for all your efforts!
>>> 
>>> Best,
>>> Kurt
>>> 
>>> 
>>> On Thu, Feb 25, 2021 at 5:01 PM Timo Walther  wrote:
>>> 
 Hi everyone,
 
 since Flink 1.9 we have supported two SQL planners. Most of the original
 plan of FLIP-32 [1] has been implemented. The Blink code merge has been
 completed and many additional features have been added exclusively to
 the new planner. The new planner is now in a much better shape than the
 legacy one.
 
 In order to avoid user confusion, reduce duplicate code, and improve
 maintainability and testing times of the Flink project as a whole we
 would like to propose the following steps to complete FLIP-32:
 
 In Flink 1.13:
 - Deprecate the `flink-table-planner` module
 - Deprecate `BatchTableEnvironment` for both Java, Scala, and Python
 
 In Flink 1.14:
 - Drop `flink-table-planner` early
 - Drop many deprecated interfaces and API on demand
 - Rename `flink-table-planner-blink` to `flink-table-planner`
 - Rename `flink-table-runtime-blink` to `flink-table-runtime`
 - Remove references of "Blink" in the code base
 
 This will have an impact on users that still use DataSet API together
 with Table API. With this change we will not support converting between
 DataSet API and Table API anymore. We hope to compensate the missing
 functionality in the new unified TableEnvironment and/or the batch mode
 in DataStream API during 1.14 and 1.15. For this, we are looking for
 further feedback which features are required in Table API/DataStream API
 to have a smooth migration path.
 
 Looking forward to your feedback.
 
 Regards,
 Timo
 
 [1]
 
 
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions
 
>>> 
> 



Re: [DISCUSS] Deprecation and removal of the legacy SQL planner

2021-03-04 Thread Timo Walther

Last call for feedback on this topic.

It seems everyone agrees to finally complete FLIP-32. Since FLIP-32 has 
been accepted for a very long time, I think we don't need another voting 
thread for executing the last implementation step. Please let me know if 
you think differently.


I will start deprecating the affected classes and interfaces beginning 
of next week.


Regards,
Timo


On 26.02.21 15:46, Seth Wiesman wrote:

Strong +1

Having two planners is confusing to users and the diverging semantics make
it difficult to provide useful learning material. It is time to rip the
bandage off.

Seth

On Fri, Feb 26, 2021 at 12:54 AM Kurt Young  wrote:




Hi Timo,

First of all I want to thank you for introducing this planner design back
in 1.9, this is a great work
that allows lots of blink features to be merged to Flink in a reasonably
short time. It greatly
accelerates the evolution speed of Table & SQL.

Everything comes with a cost, as you said, right now we are facing the
overhead of maintaining
two planners and it causes bugs and also increases imbalance between these
two planners. As
a developer and also for the good of all Table & SQL users, I also think
it's better for us to be more
focused on a single planner.

Your proposed roadmap looks good to me, +1 from my side and thanks
again for all your efforts!

Best,
Kurt


On Thu, Feb 25, 2021 at 5:01 PM Timo Walther  wrote:


Hi everyone,

since Flink 1.9 we have supported two SQL planners. Most of the original
plan of FLIP-32 [1] has been implemented. The Blink code merge has been
completed and many additional features have been added exclusively to
the new planner. The new planner is now in a much better shape than the
legacy one.

In order to avoid user confusion, reduce duplicate code, and improve
maintainability and testing times of the Flink project as a whole we
would like to propose the following steps to complete FLIP-32:

In Flink 1.13:
- Deprecate the `flink-table-planner` module
- Deprecate `BatchTableEnvironment` for both Java, Scala, and Python

In Flink 1.14:
- Drop `flink-table-planner` early
- Drop many deprecated interfaces and API on demand
- Rename `flink-table-planner-blink` to `flink-table-planner`
- Rename `flink-table-runtime-blink` to `flink-table-runtime`
- Remove references of "Blink" in the code base

This will have an impact on users that still use DataSet API together
with Table API. With this change we will not support converting between
DataSet API and Table API anymore. We hope to compensate the missing
functionality in the new unified TableEnvironment and/or the batch mode
in DataStream API during 1.14 and 1.15. For this, we are looking for
further feedback which features are required in Table API/DataStream API
to have a smooth migration path.

Looking forward to your feedback.

Regards,
Timo

[1]



https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions










[jira] [Created] (FLINK-21604) Remove adaptive scheduler nightly test profile, replace with more specific testing approach

2021-03-04 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-21604:
--

 Summary: Remove adaptive scheduler nightly test profile, replace 
with more specific testing approach
 Key: FLINK-21604
 URL: https://issues.apache.org/jira/browse/FLINK-21604
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Reporter: Robert Metzger
 Fix For: 1.13.0


We are currently running all tests with the adaptive scheduler, but only once a 
night.
This is wasteful for all tests not using any scheduler. 

Ideally, we come up with a testing strategy where we run all relevant tests 
(checkpointing, failure, recovery, .. etc. related with both schedulers.

One approach would be parameterizing some tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21603) SQLClientKafkaITCase failed due to unexpected end of file

2021-03-04 Thread Yun Tang (Jira)
Yun Tang created FLINK-21603:


 Summary: SQLClientKafkaITCase failed due to unexpected end of file
 Key: FLINK-21603
 URL: https://issues.apache.org/jira/browse/FLINK-21603
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka, Tests
Affects Versions: 1.13.0
Reporter: Yun Tang


https://myasuka.visualstudio.com/flink/_build/results?buildId=252=logs=9401bf33-03c4-5a24-83fe-e51d75db73ef=72901ab2-7cd0-57be-82b1-bca51de20fba


{code:java}

[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 498.992 
s <<< FAILURE! - in org.apache.flink.tests.util.kafka.SQLClientKafkaITCase
[ERROR] testKafka[0: kafka-version:2.4.1 
kafka-sql-version:universal](org.apache.flink.tests.util.kafka.SQLClientKafkaITCase)
  Time elapsed: 448.465 s  <<< ERROR!
java.io.IOException: 
Process execution failed due error. Error output:
gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

at 
org.apache.flink.tests.util.AutoClosableProcess$AutoClosableProcessBuilder.runBlocking(AutoClosableProcess.java:133)
at 
org.apache.flink.tests.util.AutoClosableProcess$AutoClosableProcessBuilder.runBlocking(AutoClosableProcess.java:108)
at 
org.apache.flink.tests.util.AutoClosableProcess.runBlocking(AutoClosableProcess.java:70)
at 
org.apache.flink.tests.util.kafka.LocalStandaloneKafkaResource.setupKafkaDist(LocalStandaloneKafkaResource.java:112)
at 
org.apache.flink.tests.util.kafka.LocalStandaloneKafkaResource.before(LocalStandaloneKafkaResource.java:102)
at 
org.apache.flink.util.ExternalResource$1.evaluate(ExternalResource.java:46)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at 
org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:27)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at 
org.apache.flink.util.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:27)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.apache.maven.surefire.junitcore.JUnitCore.run(JUnitCore.java:55)
at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.createRequestAndRun(JUnitCoreWrapper.java:137)
at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.executeEager(JUnitCoreWrapper.java:107)
at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:83)
at 
org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:75)
at 
org.apache.maven.surefire.junitcore.JUnitCoreProvider.invoke(JUnitCoreProvider.java:158)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 

Re: [VOTE] FLIP-151: Incremental snapshots for heap-based state backend

2021-03-04 Thread David Anderson
+1 (non-binding)

On Mon, Mar 1, 2021 at 10:12 AM Roman Khachatryan  wrote:

> Hi everyone,
>
> since the discussion [1] about FLIP-151 [2] seems to have reached a
> consensus, I'd like to start a formal vote for the FLIP.
>
> Please vote +1 to approve the FLIP, or -1 with a comment. The vote will be
> open at least until Wednesday, Mar 3rd.
>
> [1] https://s.apache.org/flip-151-discussion
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-151%3A+Incremental+snapshots+for+heap-based+state+backend
>
> Regards,
> Roman
>


Re: Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-03-04 Thread Kezhu Wang
Hi Piotrek,

For “end-flushing”, I presented no new candidates. I uses “end-flushing” to
avoid naming issues. It should be “close” in allowing checkpoint after
“close” while “endOfInput” in your approach. What I tried to express is we
can do some backporting work to mitigate possible breaking changes in your
approach.

To confirm to myself that whether I understand your approach correctly, I
want to confirm somethings to avoid misleading from my side:
* Is `endOfInput` in your proposal same as `BoundedOneInput.endInput` ?

I (also ?) think there are overlaps between `BoundedOneInput.endInput` and
`close`. If yes, I like your proposal to squash them to one and deprecate
`BoundedOneInput.endInput`.

> Also, `global-commit-handles` will need to be part of the checkpoint.

I had to admit that I overlooked this, sorry for this. In my previous
suggestion, I assumed `global-commit-handles` is not part of checkpoint.
Checkpoint will complete and persist as before. The changed parts are after
checkpoint considered completed.


Best,
Kezhu Wang


On March 4, 2021 at 17:15:53, Piotr Nowojski (pnowoj...@apache.org) wrote:

Hi Kezhu,

What do you mean by “end-flushing”? I was suggesting to just keep
`endOfInput()` and `dispose()`. Are you suggesting to have a one
`endFlushing()` method, that is called after quiescing timers/mailbox, but
before final checkpoint and `dispose()`? Are we sure we really need this
extra call? Note. If we don't need it at the moment, we could always
introduce it in the future, while if we don't and won't need it, why
complicate the API?

About the idea of returning the global-commit-handle from
`notifyCheckpointComplete()` call. Note it will be more difficult to
implement, as `CheckpointCoordinator` will need to have one extra stage of
waiting for some actions to complete. Implementation will probably be
easier if we return the global-commit-handle from `snapshotState()` call.

Also, `global-commit-handles` will need to be part of the checkpoint. They
will need to be restored/recovered in case of failure. Because of that it
might be actually impossible to implement those handles as returned from
`notifyCheckpointComplete()`. In this solution we would be in a precarious
position if the main checkpoint succeeded, CheckpointCoordinator would
start issuing `notifyCheckpointComplete()`, but persisting of the handles
would fail/keep failing. How would we recover from such a situation? We can
not recover to a previous checkpoint (`notifyCheckpointComplete()` were
already issued), but at the same time the current checkpoint is not fully
completed (global-commit-handles can not be checkpointed).

Best,
Piotrek



czw., 4 mar 2021 o 06:33 Kezhu Wang  napisał(a):

> Hi all,
>
> Glad to see convergence here and FLINK-21133:
> 1. We all prefer single final checkpoint for task not individual
> checkpoints for each operators.
> 2. To above goal, if we have to breaking something, we will.
> 3. Don’t allow recording emitting in `notifyCheckpointComplete`.
>
> For “end-flushing”, I think both approaches should function in reality,
> but we also have options/responsibilities to mitigate effect of breaking
> changes:
> A. Allowing checkpoint after “close”. Introduce config option to forbid
> this during migrating releases.
> B. Renaming “close” to “other-end-flushing-method”. We can backport that
> newly introducing “end-flushing”(as empty default method) to earlier
> releases in following patch releases. The backporting “end-flushing” will
> be called just before “close” in future patch releases. We could call
> “close” just before “dispose" in future releases and `final` it in
> `AbstractStreamOperator` when ready(to break user side code).
>
> If breaking change for this “end-flushing” in inevitable, I am kind of
> prefer renaming and backport approach. It is a chance for us to rethink the
> whole thing and discard misleading “close”(currently it is mixed/misused
> with “end-flushing” and “cleanup-resource” though javadoc claims only
> “end-flushing, this could also be considered as a bug though).
>
> Besides this, will FLIP-147 eventually need some ways to decide whether an
> operator need final checkpoint @Yun @Guowei ?  @Arvid mentions this in
> earlier mail.
>
>
> For the two phase commit, @Piotrek I like your idea. I think that
> “commit-handle” could be return to checkpoint-coordinator through
> `notifyCheckpointComplete`. This way that “commit-handle” might be reused
> by operator-coordinator’s `notifyCheckpointComplete`. Suppose following
> changes:
>
> 1. `CompletableFuture> notifyCheckpointCompleteAsync()`
> in operator.
> 2. `CompletableFuture notifyCheckpointCompleteAsync(Map CompletableFuture> subtasks)` in operator coordinator.
>
> These changes need support from:
> * Checkpoint coordinator level to bridge operator and coordinator through
> task
> * Operator level to compat existing `notifyCheckpointComplete`
>
> The checkpoint procedure will looks like:
> 1. Trigger checkpoint for operator 

[jira] [Created] (FLINK-21602) Creation of ExecutionGraph happens in main thread

2021-03-04 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-21602:
-

 Summary: Creation of ExecutionGraph happens in main thread
 Key: FLINK-21602
 URL: https://issues.apache.org/jira/browse/FLINK-21602
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Affects Versions: 1.13.0
Reporter: Till Rohrmann
 Fix For: 1.13.0


Currently, the {{AdaptiveScheduler}} creates the {{ExecutionGraph}} in the main 
thread. This also means that we are recovering state in the {{JobMasters's}} 
main thread. This can lead to instabilities because we are blocking the main 
thread for too long. I think the creation of the {{ExecutionGraph}} should 
happen in an {{ioExecutor}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-03-04 Thread Yun Gao
Hi all,

Very thanks for all the discussions!

Regarding the operator API on exit, I'm also glad that we are reaching 
consistency roughly. Based on the previous discussions
I'm also support doing the final checkpoint after the previous "close" method, 
and we rename these methods to make them
more clear. But I'm still have concerns on how we achieve the final checkpoint 
after emitting the EndOfPartitionEvent, which 
has closed the network channels ? Suppose we have A -> B and both A and B need 
to wait for the final checkpoint, there might
be several options from my side:
1. We might introduce a new type of event to notify the endOfInput() though the 
graph first, and then A/B waits for the final 
checkpoint, then A emit EndOfPartitionEvent to exit all the tasks as now.
2. We directly use EndOfPartitionEvent to notify the endOfInput() throw the 
graph first, but we also notify JM (or at least 
CheckpointCoordinator) that these tasks are in a new state TERMINATING (or 
FINISHING). Then when triggering checkpoint
the CheckpointCoordinator would treat these tasks differently by not 
considering the edges emitted EndOfPartitionEvent. In this
case, we would trigger A and B separately and there is not need for A to 
broadcast the barriers to B.
3. We still use EndOfPartitionEvent  to notify the endOfInput() throw the graph 
first and do not notify JM. Then for all the checkpoints 
we would need to trigger all the running tasks. Like in this case, both A and B 
are running and CheckpointCoordinator do not know 
whether the network between them are closed, then it has to assume the worst 
case and trigger both A and B. But this option would
introduce more overhead compared with that we only trigger sources and 
broadcast barriers in task side. 

For whether we allow the operators and tasks that do not need to commit side 
effect to exit first, I think it is also related to the above
options: if we go towards 1, the tasks would still need to exit from the 
source, and if we go towards 2/3, we could be able to allow these
tasks to finish first.

Regarding the watermark,
> There is a pre-existing issue
> that watermarks are not checkpointed/stored on state, and there was/is now
> clear answer how we should handle this as far as I remember. One
> problematic case are two/multiple input tasks or UnionInputGate, where
> combined watermark is the min of all inputs (held in memory). The problem
> so far is a bit benign, as after recovery we are losing the combined
> watermark value, but it's being slowly/lazily restored, as new watermarks
> are sent from the sources. With finished sources that won't be a case.

Very thanks for the further explain and it should indeed be a problem. Since 
now for stop-with-savepoint --drain we always emit
advance the watermark to MAX, should we also need to do it for normal exit ? If 
so, since now for recovery after some tasks finished
we would first start all the tasks and stop the finished tasks directly, I 
think for simplicity we could first emit a new MAX watermark 
from the sources before or with EndOfpartitionEvent as till suggested, and for 
the long run we could also consider snapshotting the
min watermark if we are going to not start the finished tasks directly.

Regarding the global-commit-handles,
I also like to proposal of the global-committer-handler, From the sinks' view 
I'm also lean towards emit these handler after notifyCheckpointComplete, 
but we could create these handlers in snapshotState() so that we could also 
include them in the checkpoint.

 For example, the global committer handler might be write meta store for 
FileSystem/Hive sinks, and these should happen after all the pending
files are renamed to the final ones, otherwise the downstream jobs might miss 
some files if they relies on the meta store to identify ready partitions.
Thus we would have to emit the global-committer-handler after 
notifyCheckpointComplete. But since we could be able to know the list of files
to rename in snapshotState(), we could create the global-committer-handler and 
store them there. 

Also since we might want to keep the order, the operator coordinator might not 
relies on its own notifyCheckpointComplete() notification,
but wait for the the operator to notify it about the checkpoint complete after 
the operator has finished its processing first.

Best,
Yun



--
From:Piotr Nowojski 
Send Time:2021 Mar. 4 (Thu.) 17:16
To:Kezhu Wang 
Cc:dev ; Yun Gao ; 
jingsongl...@gmail.com ; Guowei Ma 
; Till Rohrmann 
Subject:Re: Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

Hi Kezhu,

What do you mean by “end-flushing”? I was suggesting to just keep 
`endOfInput()` and `dispose()`. Are you suggesting to have a one 
`endFlushing()` method, that is called after quiescing timers/mailbox, but 
before final checkpoint and `dispose()`? Are we sure we really need this extra 
call? Note. If we don't need it at 

Re: Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-03-04 Thread Piotr Nowojski
Hi Kezhu,

What do you mean by “end-flushing”? I was suggesting to just keep
`endOfInput()` and `dispose()`. Are you suggesting to have a one
`endFlushing()` method, that is called after quiescing timers/mailbox, but
before final checkpoint and `dispose()`? Are we sure we really need this
extra call? Note. If we don't need it at the moment, we could always
introduce it in the future, while if we don't and won't need it, why
complicate the API?

About the idea of returning the global-commit-handle from
`notifyCheckpointComplete()` call. Note it will be more difficult to
implement, as `CheckpointCoordinator` will need to have one extra stage of
waiting for some actions to complete. Implementation will probably be
easier if we return the global-commit-handle from `snapshotState()` call.

Also, `global-commit-handles` will need to be part of the checkpoint. They
will need to be restored/recovered in case of failure. Because of that it
might be actually impossible to implement those handles as returned from
`notifyCheckpointComplete()`. In this solution we would be in a precarious
position if the main checkpoint succeeded, CheckpointCoordinator would
start issuing `notifyCheckpointComplete()`, but persisting of the handles
would fail/keep failing. How would we recover from such a situation? We can
not recover to a previous checkpoint (`notifyCheckpointComplete()` were
already issued), but at the same time the current checkpoint is not fully
completed (global-commit-handles can not be checkpointed).

Best,
Piotrek



czw., 4 mar 2021 o 06:33 Kezhu Wang  napisał(a):

> Hi all,
>
> Glad to see convergence here and FLINK-21133:
> 1. We all prefer single final checkpoint for task not individual
> checkpoints for each operators.
> 2. To above goal, if we have to breaking something, we will.
> 3. Don’t allow recording emitting in `notifyCheckpointComplete`.
>
> For “end-flushing”, I think both approaches should function in reality,
> but we also have options/responsibilities to mitigate effect of breaking
> changes:
> A. Allowing checkpoint after “close”. Introduce config option to forbid
> this during migrating releases.
> B. Renaming “close” to “other-end-flushing-method”. We can backport that
> newly introducing “end-flushing”(as empty default method) to earlier
> releases in following patch releases. The backporting “end-flushing” will
> be called just before “close” in future patch releases. We could call
> “close” just before “dispose" in future releases and `final` it in
> `AbstractStreamOperator` when ready(to break user side code).
>
> If breaking change for this “end-flushing” in inevitable, I am kind of
> prefer renaming and backport approach. It is a chance for us to rethink the
> whole thing and discard misleading “close”(currently it is mixed/misused
> with “end-flushing” and “cleanup-resource” though javadoc claims only
> “end-flushing, this could also be considered as a bug though).
>
> Besides this, will FLIP-147 eventually need some ways to decide whether an
> operator need final checkpoint @Yun @Guowei ?  @Arvid mentions this in
> earlier mail.
>
>
> For the two phase commit, @Piotrek I like your idea. I think that
> “commit-handle” could be return to checkpoint-coordinator through
> `notifyCheckpointComplete`. This way that “commit-handle” might be reused
> by operator-coordinator’s `notifyCheckpointComplete`. Suppose following
> changes:
>
> 1. `CompletableFuture> notifyCheckpointCompleteAsync()`
> in operator.
> 2. `CompletableFuture notifyCheckpointCompleteAsync(Map CompletableFuture> subtasks)` in operator coordinator.
>
> These changes need support from:
> * Checkpoint coordinator level to bridge operator and coordinator through
> task
> * Operator level to compat existing `notifyCheckpointComplete`
>
> The checkpoint procedure will looks like:
> 1. Trigger checkpoint for operator coordinator.
> 2. If above succeeds, trigger tasks checkpoint. Abort otherwise.
> 3. If all above succeeds, complete current checkpoint. Abort otherwise.
> 4. If job fails after, restore from above “completed” checkpoint.
> 5. Notify checkpoint completion to tasks.
> 6. Notify checkpoint completion to coordinators.
> 7. Wait step#5 and step#6 to succeed. Now it is real completed. Either
> this succeed or job failed in meantime ? May be other concurrent conditions.
>
> With these changes, migration FLIP-143 sink to operator coordinator should
> be easy.
>
> It will definitely complicate currently complex checkpoint coordinator as
> @Till mentioned in FLINK-21133.
>
>
> Best,
> Kezhu Wang
>
> On March 3, 2021 at 01:09:50, Piotr Nowojski (pnowoj...@apache.org) wrote:
>
> Hi,
>
> Thanks for reminding me. I think FLIP-147 will have to deal in one way or
> another with the (re?)emitting MAX_WATERMARK. There is a pre-existing
> issue
> that watermarks are not checkpointed/stored on state, and there was/is now
> clear answer how we should handle this as far as I remember. One
> problematic case are two/multiple input tasks 

[jira] [Created] (FLINK-21601) observing errors in jobmanager - after created prometheus metric reporter

2021-03-04 Thread Bhagi (Jira)
Bhagi created FLINK-21601:
-

 Summary: observing errors in jobmanager - after created prometheus 
metric reporter
 Key: FLINK-21601
 URL: https://issues.apache.org/jira/browse/FLINK-21601
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Kubernetes
Affects Versions: 1.12.0
Reporter: Bhagi
 Fix For: 1.12.0
 Attachments: image-2021-03-04-14-23-35-915.png

Hi Team,

I created the prometheus metric reporter for monitoring flink parameters.

After necessary configuration done for reporter and flink job managers can send 
data to prometheus.

 

but i am able to see Error information reg the Prometheus end point connection.

I have attached the logs please look into it.

metrics.reporters: prom
 metrics.reporter.prom.class: 
org.apache.flink.metrics.prometheus.PrometheusReporter
 metrics.reporter.prom.port: 9090



--
This message was sent by Atlassian Jira
(v8.3.4#803005)