date:20200226

[jira] [Created] (FLINK-16300) Reworks SchedulerTestUtils with testing classes to replace mockito usages

2020-02-26 Thread Zhu Zhu (Jira)

Zhu Zhu created FLINK-16300:
---

 Summary: Reworks SchedulerTestUtils with testing classes to 
replace mockito usages
 Key: FLINK-16300
 URL: https://issues.apache.org/jira/browse/FLINK-16300
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination, Tests
Affects Versions: 1.11.0
Reporter: Zhu Zhu
Assignee: Zhu Zhu
 Fix For: 1.11.0


Mockito is used in SchedulerTestUtils to mock ExecutionVertex and Execution for 
testing. It fails to mock every getter so that other tests use it may encounter 
NPE issues, e.g. ExecutionVertex#getID().
Mockito is also discouraged to be used in Flink tests. So I'd propose to 
reworks the utils with testing classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] FLIP-100: Add Attempt Information

2020-02-26 Thread Yadong Xie

Hi Till

We keep response a flattened SubtaskTimeInfo and an array of
SubtaskTimeInfo to keep the restAPI backward compatible, since Flink users
may still need the API('/jobs/{jobId}/vertices/{vertexId}/subtasktimes') to
get the flattened SubtaskTimeInfo.

If we want to get an array of SubtaskTimeInfo in the API, a new URL needs
to be created other than reuse the old one.

Both solutions are ok for me. What do you think about it?

Till Rohrmann  于2020年2月26日周三 下午10:53写道：

> Fair enough. If this should become a problem we could introduce it later as
> well.
>
> What about changing the SubtasksTimeInfo response type into an array of
> SubtaskTimeInfo? At the moment SubtasksTimeInfo contains a
> flattened SubtaskTimeInfo and and array of SubtaskTimeInfo for the previous
> attempts.
>
> Cheers,
> Till
>
> On Wed, Feb 26, 2020 at 1:16 PM Yadong Xie  wrote:
>
> > Hi Till
> >
> > Thanks for your comments.
> >
> > > I have a comment concerning the SubtasksTimesHandler
> >
> > It would be much easier for the frontend to handle a large amount of data
> > if we have a rest API parameter filter, but in my opinion, the attempt
> list
> > data is not large enough that we have to rely on the rest API parameters
> > paging, we still can handle them all in the frontend.
> >
> > Users can filter the attempt list by the
> status(scheduled/created/deploying
> > and so on) and other keywords(attempt_id and so on) directly in the
> > frontend since all data are listed from the rest API.
> > If we move some of the filter parameters to the rest API path parameter,
> > all the other filter parameters need to be moved too.
> >
> > I suggest adding an attempt id filter in the UI to help users filter the
> > desired attempt, and all the filtering process is running inside the
> > browser side, what do you think about this?
> >
> >
> >
> >
> > Till Rohrmann  于2020年2月25日周二 下午11:40写道：
> >
> > > Hi Yadong,
> > >
> > > thanks for creating this FLIP. I like the idea to make the web-ui
> > > information richer wrt to subtask attempt information.
> > >
> > > I have a comment concerning the SubtasksTimesHandler: Should we change
> > the
> > > response type SubtasksTimeInfo so that it simply contains an
> > > array of SubtaskTimeInfo? One could add an attempt range path parameter
> > to
> > > the SubtasksTimesHandler to be able to control which attempts will be
> > > returned.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Feb 25, 2020 at 9:57 AM Benchao Li 
> wrote:
> > >
> > > > Hi Yadong,
> > > >
> > > > Thanks for the updating.  LGTM now.
> > > >
> > > > +1 (non-binding)
> > > >
> > > > Yadong Xie  于2020年2月25日周二 下午4:41写道：
> > > >
> > > > > Hi Kurt
> > > > >
> > > > > There will be no differences between batch jobs and stream jobs in
> > > > > subtask-attempt level in the UI
> > > > > The only differences are in the vertex timeline, I have added a
> > > > screenshot
> > > > > of the batch job in the FLIP-100 since the batch job will disappear
> > > from
> > > > > the list after it finished soon.
> > > > > here is the link:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-100%3A+Add+Attempt+Information
> > > > >
> > > > >
> > > > > Kurt Young  于2020年2月21日周五 上午11:51写道：
> > > > >
> > > > > > Hi Yadong,
> > > > > >
> > > > > > Thanks for the proposal, it's a useful feature, especially for
> > batch
> > > > > jobs.
> > > > > > But according
> > > > > > to the examples you gave, I can't tell whether i got required
> > > > information
> > > > > > from that.
> > > > > > Can you replace the demo job to a more complex batch job and then
> > we
> > > > can
> > > > > > see some
> > > > > > differences of start/stop time of different tasks and attempts?
> > > > > >
> > > > > > Best,
> > > > > > Kurt
> > > > > >
> > > > > >
> > > > > > On Thu, Feb 20, 2020 at 5:46 PM Yadong Xie 
> > > > wrote:
> > > > > >
> > > > > > > Hi all
> > > > > > >
> > > > > > > I want to start the vote for FLIP-100, which proposes to add
> > > attempt
> > > > > > > information inside subtask and timeline in web UI.
> > > > > > >
> > > > > > > To help everyone better understand the proposal, we spent some
> > > > efforts
> > > > > on
> > > > > > > making an online POC
> > > > > > >
> > > > > > > Timeline Attempt (click the vertex timeline to see the
> > > differences):
> > > > > > > previous web:
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://101.132.122.69:8081/#/job/9d651769488466d33e7a607e85203543/timeline
> > > > > > > POC web:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://101.132.122.69:8081/web/#/job/9d651769488466d33e7a607e85203543/timeline
> > > > > > >
> > > > > > > Subtask Attempt (click the vertex and switch to subtask tab to
> > see
> > > > the
> > > > > > > differences):
> > > > > > > previous web:
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://101.132.122.69:8081/#/job/9d651769488466d33e7a607e85203543/overview
> > > > > > > POC web:
> > > > > > >
> >

[jira] [Created] (FLINK-16299) Release containers recovered from previous attempt in which TaskExecutor is not started.

2020-02-26 Thread Xintong Song (Jira)

Xintong Song created FLINK-16299:


 Summary: Release containers recovered from previous attempt in 
which TaskExecutor is not started.
 Key: FLINK-16299
 URL: https://issues.apache.org/jira/browse/FLINK-16299
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Reporter: Xintong Song


As discussed in FLINK-16215, on Yarn deployment, {{YarnResourceManager}} starts 
a new {{TaskExecutor}} in two steps:
 # Request a new container from Yarn
 # Starts a {{TaskExecutor}} process in the allocated container

If JM failover happens between the two steps, in the new attempt 
{{YarnResourceManager}} will not start {{TaskExecutor}} processes in recovered 
containers. That means such containers are neither used nor released.

A potential fix to this problem, is to query form the container status by 
calling {{NMClientAsync#getContainerStatusAsync}}, and release the containers 
whose state is {{NEW}}, keeps only those whose state is {{RUNNING}} and waiting 
for them to register.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[VOTE] FLIP-84: Improve & Refactor API of TableEnvironment

2020-02-26 Thread godfrey he

Hi everyone,

I'd like to start the vote of FLIP-84[1], which proposes to deprecate some
old APIs and introduce some new APIs in TableEnvironment. This FLIP is
discussed and reached consensus in the discussion thread[2].

The vote will be open for at least 72 hours. Unless there is an objection,
I will try to close it by Mar 1, 2020 07:00 UTC if we have received
sufficient votes.


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-84%3A+Improve+%26+Refactor+API+of+TableEnvironment

[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Improve-amp-Refactor-API-of-Table-Module-td34537.html


Bests,
Godfrey

[jira] [Created] (FLINK-16298) GroupWindowTableAggregateITCase.testEventTimeTumblingWindow fails on Travis

2020-02-26 Thread Yingjie Cao (Jira)

Yingjie Cao created FLINK-16298:
---

 Summary: 
GroupWindowTableAggregateITCase.testEventTimeTumblingWindow fails on Travis
 Key: FLINK-16298
 URL: https://issues.apache.org/jira/browse/FLINK-16298
 Project: Flink
  Issue Type: Test
  Components: Tests
Reporter: Yingjie Cao


GroupWindowTableAggregateITCase.testEventTimeTumblingWindow fails on Travis. 
link: [https://api.travis-ci.com/v3/job/291610383/log.txt]

stack:
{code:java}
05:38:01.976 [ERROR] Tests run: 18, Failures: 0, Errors: 1, Skipped: 0, Time 
elapsed: 7.537 s <<< FAILURE! - in 
org.apache.flink.table.planner.runtime.stream.table.GroupWindowTableAggregateITCase
05:38:01.976 [ERROR] 
testEventTimeTumblingWindow[StateBackend=HEAP](org.apache.flink.table.planner.runtime.stream.table.GroupWindowTableAggregateITCase)
  Time elapsed: 0.459 s  <<< ERROR!
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at 
org.apache.flink.table.planner.runtime.stream.table.GroupWindowTableAggregateITCase.testEventTimeTumblingWindow(GroupWindowTableAggregateITCase.scala:151)
Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by 
FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=1, 
backoffTimeMS=0)
Caused by: java.lang.Exception: Artificial Failure
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16297) Remove the redundant intent and blank line in highlight code blocks

2020-02-26 Thread Yangze Guo (Jira)

Yangze Guo created FLINK-16297:
--

 Summary: Remove the redundant intent and blank line in highlight 
code blocks
 Key: FLINK-16297
 URL: https://issues.apache.org/jira/browse/FLINK-16297
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Yangze Guo
 Attachments: 屏幕快照 2020-02-27 下午2.01.51.png

Currently, there are lots of redundant intent and blank lines in highlight code 
blocks of docs. Such as
 !屏幕快照 2020-02-27 下午2.01.51.png! 
 in 
https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/batch/connectors.html.
The root cause is the invalid intent of \{% highlight %\} and \{% endhighlight 
%\}. We need to fix it to improve the reading experience.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-26 Thread Yuan Mei

Congrats!

Best
Yuan

On Thu, Feb 27, 2020 at 8:48 AM Guowei Ma  wrote:

> Congratulations !!
> Best,
> Guowei
>
>
> Yun Tang  于2020年2月27日周四 上午2:11写道：
>
> > Congratulations and well deserved!
> >
> >
> > Best
> > Yun Tang
> > 
> > From: Canbin Zheng 
> > Sent: Monday, February 24, 2020 16:07
> > To: dev 
> > Subject: Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
> >
> > Congratulations !!
> >
> > Dawid Wysakowicz  于2020年2月24日周一 下午3:55写道：
> >
> > > Congratulations Jingsong!
> > >
> > > Best,
> > >
> > > Dawid
> > >
> > > On 24/02/2020 08:12, zhenya Sun wrote:
> > > > Congratulations！！！
> > > > | |
> > > > zhenya Sun
> > > > |
> > > > |
> > > > toke...@126.com
> > > > |
> > > > 签名由网易邮箱大师定制
> > > >
> > > >
> > > > On 02/24/2020 14:35，Yu Li wrote：
> > > > Congratulations Jingsong! Well deserved.
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > >
> > > > On Mon, 24 Feb 2020 at 14:10, Congxian Qiu 
> > > wrote:
> > > >
> > > > Congratulations Jingsong!
> > > >
> > > > Best,
> > > > Congxian
> > > >
> > > >
> > > > jincheng sun  于2020年2月24日周一 下午1:38写道：
> > > >
> > > > Congratulations Jingsong!
> > > >
> > > > Best,
> > > > Jincheng
> > > >
> > > >
> > > > Zhu Zhu  于2020年2月24日周一 上午11:55写道：
> > > >
> > > > Congratulations Jingsong!
> > > >
> > > > Thanks,
> > > > Zhu Zhu
> > > >
> > > > Fabian Hueske  于2020年2月22日周六 上午1:30写道：
> > > >
> > > > Congrats Jingsong!
> > > >
> > > > Cheers, Fabian
> > > >
> > > > Am Fr., 21. Feb. 2020 um 17:49 Uhr schrieb Rong Rong <
> > > > walter...@gmail.com>:
> > > >
> > > > Congratulations Jingsong!!
> > > >
> > > > Cheers,
> > > > Rong
> > > >
> > > > On Fri, Feb 21, 2020 at 8:45 AM Bowen Li 
> wrote:
> > > >
> > > > Congrats, Jingsong!
> > > >
> > > > On Fri, Feb 21, 2020 at 7:28 AM Till Rohrmann  > > >
> > > > wrote:
> > > >
> > > > Congratulations Jingsong!
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Fri, Feb 21, 2020 at 4:03 PM Yun Gao 
> > > > wrote:
> > > >
> > > > Congratulations Jingsong!
> > > >
> > > > Best,
> > > > Yun
> > > >
> > > > --
> > > > From:Jingsong Li 
> > > > Send Time:2020 Feb. 21 (Fri.) 21:42
> > > > To:Hequn Cheng 
> > > > Cc:Yang Wang ; Zhijiang <
> > > > wangzhijiang...@aliyun.com>; Zhenghua Gao ;
> > > > godfrey
> > > > he ; dev ; user <
> > > > u...@flink.apache.org>
> > > > Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
> > > >
> > > > Thanks everyone~
> > > >
> > > > It's my pleasure to be part of the community. I hope I can make a
> > > > better
> > > > contribution in future.
> > > >
> > > > Best,
> > > > Jingsong Lee
> > > >
> > > > On Fri, Feb 21, 2020 at 2:48 PM Hequn Cheng 
> > > > wrote:
> > > > Congratulations Jingsong! Well deserved.
> > > >
> > > > Best,
> > > > Hequn
> > > >
> > > > On Fri, Feb 21, 2020 at 2:42 PM Yang Wang 
> > > > wrote:
> > > > Congratulations！Jingsong. Well deserved.
> > > >
> > > >
> > > > Best,
> > > > Yang
> > > >
> > > > Zhijiang  于2020年2月21日周五 下午1:18写道：
> > > > Congrats Jingsong! Welcome on board!
> > > >
> > > > Best,
> > > > Zhijiang
> > > >
> > > > --
> > > > From:Zhenghua Gao 
> > > > Send Time:2020 Feb. 21 (Fri.) 12:49
> > > > To:godfrey he 
> > > > Cc:dev ; user 
> > > > Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
> > > >
> > > > Congrats Jingsong！
> > > >
> > > >
> > > > *Best Regards,*
> > > > *Zhenghua Gao*
> > > >
> > > >
> > > > On Fri, Feb 21, 2020 at 11:59 AM godfrey he 
> > > > wrote:
> > > > Congrats Jingsong! Well deserved.
> > > >
> > > > Best,
> > > > godfrey
> > > >
> > > > Jeff Zhang  于2020年2月21日周五 上午11:49写道：
> > > > Congratulations！Jingsong. You deserve it
> > > >
> > > > wenlong.lwl  于2020年2月21日周五 上午11:43写道：
> > > > Congrats Jingsong!
> > > >
> > > > On Fri, 21 Feb 2020 at 11:41, Dian Fu 
> > > > wrote:
> > > >
> > > > Congrats Jingsong!
> > > >
> > > > 在 2020年2月21日，上午11:39，Jark Wu  写道：
> > > >
> > > > Congratulations Jingsong! Well deserved.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Fri, 21 Feb 2020 at 11:32, zoudan  wrote:
> > > >
> > > > Congratulations! Jingsong
> > > >
> > > >
> > > > Best,
> > > > Dan Zou
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards
> > > >
> > > > Jeff Zhang
> > > >
> > > >
> > > >
> > > > --
> > > > Best, Jingsong Lee
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> >
>

[jira] [Created] (FLINK-16296) Improve performance of BaseRowSerializer#serialize() for GenericRow

2020-02-26 Thread Jark Wu (Jira)

Jark Wu created FLINK-16296:
---

 Summary: Improve performance of BaseRowSerializer#serialize() for 
GenericRow
 Key: FLINK-16296
 URL: https://issues.apache.org/jira/browse/FLINK-16296
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Runtime
Reporter: Jark Wu


Currently, when serialize a {{GenericRow}} using 
{{BaseRowSerializer#serialize()}} , there will be 2 memory copy. The first is 
GenericRow -> BinaryRow, the second is  BinaryRow -> DataOutputView. 

However, in theory, we can serialize GenericRow into DataOutputView directly, 
because we already get all the column values and types. We can serialize the 
null bit part for all columns and then the fix-part for all columns and then 
the variable lenght part. 

For example, when the column is a BinaryString, we can serialize the pos and 
length, and calcute the new variable part length, and then serialize the next 
column. If there is a generic type in the row, then it will fallback into 
previous way. But generic type in SQL is rare. 

This is a general improvements and can be benefit for every operators. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API

2020-02-26 Thread Jingsong Li

Hi Jark,

The matrix I see is SQL cast. If we need bring another conversion matrix
that is different from SQL cast, I don't understand the benefits. It makes
me difficult to understand.
And It seems bad to change the timestamp of different time zones to the
same value silently.

I have seen a lot of timestamp formats,  SQL, ISO, RFC. I can think that a
"timestampFormat" could help them to deal with various formats.
What way do you think can solve all the problems?

Best,
Jingsong Lee

On Wed, Feb 26, 2020 at 10:45 PM Jark Wu  wrote:

> Hi Jingsong,
>
> I don't think it should follow SQL CAST semantics, because it is out of
> SQL, it happens in connectors which converts users'/external's format into
> SQL types.
> I also doubt "timestampFormat" may not work in some cases, because the
> timestamp format maybe various and mixed in a topic.
>
> Best,
> Jark
>
> On Wed, 26 Feb 2020 at 22:20, Jingsong Li  wrote:
>
>> Thanks all for your discussion.
>>
>> Hi Dawid,
>>
>> +1 to apply the logic of parsing a SQL timestamp literal.
>>
>> I don't fully understand the matrix your list. Should this be the
>> semantics of SQL cast?
>> Do you mean this is implicit cast in JSON parser?
>> I doubt that because these implicit casts are not support
>> in LogicalTypeCasts. And it is not so good to understand when it occur
>> silently.
>>
>> How about add "timestampFormat" property to JSON parser? Its default
>> value is SQL timestamp literal format. And user can configure this.
>>
>> Best,
>> Jingsong Lee
>>
>> On Wed, Feb 26, 2020 at 6:39 PM Jark Wu  wrote:
>>
>>> Hi Dawid,
>>>
>>> I agree with you. If we want to loosen the format constraint, the
>>> important piece is the conversion matrix.
>>>
>>> The conversion matrix you listed makes sense to me. From my
>>> understanding,
>>> there should be 6 combination.
>>> We can add WITHOUT TIMEZONE => WITHOUT TIMEZONE and WITH TIMEZONE => WITH
>>> TIMEZONE to make the matrix complete.
>>> When the community reach an agreement on this, we should write it down on
>>> the documentation and follow the matrix in all text-based formats.
>>>
>>> Regarding to the RFC 3339 compatibility mode switch, it also sounds good
>>> to
>>> me.
>>>
>>> Best,
>>> Jark
>>>
>>> On Wed, 26 Feb 2020 at 17:44, Dawid Wysakowicz 
>>> wrote:
>>>
>>> > Hi all,
>>> >
>>> > @NiYanchun Thank you for reporting this. Yes I think we could improve
>>> the
>>> > behaviour of the JSON format.
>>> >
>>> > @Jark First of all I do agree we could/should improve the
>>> > "user-friendliness" of the JSON format (and unify the behavior across
>>> text
>>> > based formats). I am not sure though if it is as simple as just ignore
>>> the
>>> > time zone here.
>>> >
>>> > My suggestion would be rather to apply the logic of parsing a SQL
>>> > timestamp literal (if the expected type is of
>>> LogicalTypeFamily.TIMESTAMP),
>>> > which would actually also derive the "stored" type of the timestamp
>>> (either
>>> > WITHOUT TIMEZONE or WITH TIMEZONE) and then apply a proper sql
>>> conversion.
>>> > Therefore if the
>>> >
>>> > parsed type |requested type|
>>> behaviour
>>> >
>>> > WITHOUT TIMEZONE| WITH TIMEZONE | store the local
>>> > timezone with the data
>>> >
>>> > WITHOUT TIMEZONE| WITH LOCAL TIMEZONE  | do nothing in the
>>> data,
>>> > interpret the time in local timezone
>>> >
>>> > WITH TIMEZONE  | WITH LOCAL TIMEZONE   | convert the
>>> timestamp
>>> > to local timezone and drop the time zone information
>>> >
>>> > WITH TIMEZONE  | WITHOUT TIMEZONE   | drop the time
>>> zone
>>> > information
>>> >
>>> > It might just boil down to what you said "being more lenient with
>>> regards
>>> > to parsing the time zone". Nevertheless I think this way it is a bit
>>> better
>>> > defined behaviour, especially as it has a defined behaviour when
>>> converting
>>> > between representation with or without time zone.
>>> >
>>> > An implementation note. I think we should aim to base the
>>> implementation
>>> > on the DataTypes already rather than going back to the TypeInformation.
>>> >
>>> > I would still try to leave the RFC 3339 compatibility mode, but maybe
>>> for
>>> > that mode it would make sense to not support any types WITHOUT
>>> TIMEZONE?
>>> > This would be enabled with a switch (disabled by default). As I
>>> understand
>>> > the RFC, making the time zone mandatory is actually a big part of the
>>> > standard as it makes time types unambiguous.
>>> >
>>> > What do you think?
>>> >
>>> > Ps. I cross posted this on the dev ML.
>>> >
>>> > Best,
>>> >
>>> > Dawid
>>> >
>>> >
>>> > On 26/02/2020 03:45, Jark Wu wrote:
>>> >
>>> > Yes, I'm also in favor of loosen the datetime format constraint.
>>> > I guess most of the users don't know there is a JSON standard which
>>> > follows RFC 3339.
>>> >
>>> > Best,
>>> > Jark
>>> >
>>> > On Wed, 26 Feb 2020 at 10:06, NiYanchun  wrote:
>>> >
>>> >> Yes, these Types definition are general. As a

[jira] [Created] (FLINK-16295) Optimize BinaryString.copy to not materialize if there is javaObject

2020-02-26 Thread Jingsong Lee (Jira)

Jingsong Lee created FLINK-16295:


 Summary: Optimize BinaryString.copy to not materialize if there is 
javaObject
 Key: FLINK-16295
 URL: https://issues.apache.org/jira/browse/FLINK-16295
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Runtime
Reporter: Jingsong Lee
 Fix For: 1.11.0


When not object reuse, this copy is really performance killer.

CC: [~jark] [~ykt836]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16294) JDBC connector support create database table automatically

2020-02-26 Thread Leonard Xu (Jira)

Leonard Xu created FLINK-16294:
--

 Summary: JDBC connector support create database table automatically
 Key: FLINK-16294
 URL: https://issues.apache.org/jira/browse/FLINK-16294
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Ecosystem
Affects Versions: 1.11.0
Reporter: Leonard Xu
 Fix For: 1.11.0


Kafka connector/Elasticsearch connector support create topic/index 
automatically when topic/index not exists in kafka/Elasticsearch from now.
This issue aims to support JDBC connector can create database table 
automatically which will be more friendly to user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-26 Thread Guowei Ma

Congratulations !!
Best,
Guowei


Yun Tang  于2020年2月27日周四 上午2:11写道：

> Congratulations and well deserved!
>
>
> Best
> Yun Tang
> 
> From: Canbin Zheng 
> Sent: Monday, February 24, 2020 16:07
> To: dev 
> Subject: Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
>
> Congratulations !!
>
> Dawid Wysakowicz  于2020年2月24日周一 下午3:55写道：
>
> > Congratulations Jingsong!
> >
> > Best,
> >
> > Dawid
> >
> > On 24/02/2020 08:12, zhenya Sun wrote:
> > > Congratulations！！！
> > > | |
> > > zhenya Sun
> > > |
> > > |
> > > toke...@126.com
> > > |
> > > 签名由网易邮箱大师定制
> > >
> > >
> > > On 02/24/2020 14:35，Yu Li wrote：
> > > Congratulations Jingsong! Well deserved.
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Mon, 24 Feb 2020 at 14:10, Congxian Qiu 
> > wrote:
> > >
> > > Congratulations Jingsong!
> > >
> > > Best,
> > > Congxian
> > >
> > >
> > > jincheng sun  于2020年2月24日周一 下午1:38写道：
> > >
> > > Congratulations Jingsong!
> > >
> > > Best,
> > > Jincheng
> > >
> > >
> > > Zhu Zhu  于2020年2月24日周一 上午11:55写道：
> > >
> > > Congratulations Jingsong!
> > >
> > > Thanks,
> > > Zhu Zhu
> > >
> > > Fabian Hueske  于2020年2月22日周六 上午1:30写道：
> > >
> > > Congrats Jingsong!
> > >
> > > Cheers, Fabian
> > >
> > > Am Fr., 21. Feb. 2020 um 17:49 Uhr schrieb Rong Rong <
> > > walter...@gmail.com>:
> > >
> > > Congratulations Jingsong!!
> > >
> > > Cheers,
> > > Rong
> > >
> > > On Fri, Feb 21, 2020 at 8:45 AM Bowen Li  wrote:
> > >
> > > Congrats, Jingsong!
> > >
> > > On Fri, Feb 21, 2020 at 7:28 AM Till Rohrmann  > >
> > > wrote:
> > >
> > > Congratulations Jingsong!
> > >
> > > Cheers,
> > > Till
> > >
> > > On Fri, Feb 21, 2020 at 4:03 PM Yun Gao 
> > > wrote:
> > >
> > > Congratulations Jingsong!
> > >
> > > Best,
> > > Yun
> > >
> > > --
> > > From:Jingsong Li 
> > > Send Time:2020 Feb. 21 (Fri.) 21:42
> > > To:Hequn Cheng 
> > > Cc:Yang Wang ; Zhijiang <
> > > wangzhijiang...@aliyun.com>; Zhenghua Gao ;
> > > godfrey
> > > he ; dev ; user <
> > > u...@flink.apache.org>
> > > Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
> > >
> > > Thanks everyone~
> > >
> > > It's my pleasure to be part of the community. I hope I can make a
> > > better
> > > contribution in future.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Fri, Feb 21, 2020 at 2:48 PM Hequn Cheng 
> > > wrote:
> > > Congratulations Jingsong! Well deserved.
> > >
> > > Best,
> > > Hequn
> > >
> > > On Fri, Feb 21, 2020 at 2:42 PM Yang Wang 
> > > wrote:
> > > Congratulations！Jingsong. Well deserved.
> > >
> > >
> > > Best,
> > > Yang
> > >
> > > Zhijiang  于2020年2月21日周五 下午1:18写道：
> > > Congrats Jingsong! Welcome on board!
> > >
> > > Best,
> > > Zhijiang
> > >
> > > --
> > > From:Zhenghua Gao 
> > > Send Time:2020 Feb. 21 (Fri.) 12:49
> > > To:godfrey he 
> > > Cc:dev ; user 
> > > Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
> > >
> > > Congrats Jingsong！
> > >
> > >
> > > *Best Regards,*
> > > *Zhenghua Gao*
> > >
> > >
> > > On Fri, Feb 21, 2020 at 11:59 AM godfrey he 
> > > wrote:
> > > Congrats Jingsong! Well deserved.
> > >
> > > Best,
> > > godfrey
> > >
> > > Jeff Zhang  于2020年2月21日周五 上午11:49写道：
> > > Congratulations！Jingsong. You deserve it
> > >
> > > wenlong.lwl  于2020年2月21日周五 上午11:43写道：
> > > Congrats Jingsong!
> > >
> > > On Fri, 21 Feb 2020 at 11:41, Dian Fu 
> > > wrote:
> > >
> > > Congrats Jingsong!
> > >
> > > 在 2020年2月21日，上午11:39，Jark Wu  写道：
> > >
> > > Congratulations Jingsong! Well deserved.
> > >
> > > Best,
> > > Jark
> > >
> > > On Fri, 21 Feb 2020 at 11:32, zoudan  wrote:
> > >
> > > Congratulations! Jingsong
> > >
> > >
> > > Best,
> > > Dan Zou
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Best Regards
> > >
> > > Jeff Zhang
> > >
> > >
> > >
> > > --
> > > Best, Jingsong Lee
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
>

Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-26 Thread Yun Tang

Congratulations and well deserved!


Best
Yun Tang

From: Canbin Zheng 
Sent: Monday, February 24, 2020 16:07
To: dev 
Subject: Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

Congratulations !!

Dawid Wysakowicz  于2020年2月24日周一 下午3:55写道：

> Congratulations Jingsong!
>
> Best,
>
> Dawid
>
> On 24/02/2020 08:12, zhenya Sun wrote:
> > Congratulations！！！
> > | |
> > zhenya Sun
> > |
> > |
> > toke...@126.com
> > |
> > 签名由网易邮箱大师定制
> >
> >
> > On 02/24/2020 14:35，Yu Li wrote：
> > Congratulations Jingsong! Well deserved.
> >
> > Best Regards,
> > Yu
> >
> >
> > On Mon, 24 Feb 2020 at 14:10, Congxian Qiu 
> wrote:
> >
> > Congratulations Jingsong!
> >
> > Best,
> > Congxian
> >
> >
> > jincheng sun  于2020年2月24日周一 下午1:38写道：
> >
> > Congratulations Jingsong!
> >
> > Best,
> > Jincheng
> >
> >
> > Zhu Zhu  于2020年2月24日周一 上午11:55写道：
> >
> > Congratulations Jingsong!
> >
> > Thanks,
> > Zhu Zhu
> >
> > Fabian Hueske  于2020年2月22日周六 上午1:30写道：
> >
> > Congrats Jingsong!
> >
> > Cheers, Fabian
> >
> > Am Fr., 21. Feb. 2020 um 17:49 Uhr schrieb Rong Rong <
> > walter...@gmail.com>:
> >
> > Congratulations Jingsong!!
> >
> > Cheers,
> > Rong
> >
> > On Fri, Feb 21, 2020 at 8:45 AM Bowen Li  wrote:
> >
> > Congrats, Jingsong!
> >
> > On Fri, Feb 21, 2020 at 7:28 AM Till Rohrmann  >
> > wrote:
> >
> > Congratulations Jingsong!
> >
> > Cheers,
> > Till
> >
> > On Fri, Feb 21, 2020 at 4:03 PM Yun Gao 
> > wrote:
> >
> > Congratulations Jingsong!
> >
> > Best,
> > Yun
> >
> > --
> > From:Jingsong Li 
> > Send Time:2020 Feb. 21 (Fri.) 21:42
> > To:Hequn Cheng 
> > Cc:Yang Wang ; Zhijiang <
> > wangzhijiang...@aliyun.com>; Zhenghua Gao ;
> > godfrey
> > he ; dev ; user <
> > u...@flink.apache.org>
> > Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
> >
> > Thanks everyone~
> >
> > It's my pleasure to be part of the community. I hope I can make a
> > better
> > contribution in future.
> >
> > Best,
> > Jingsong Lee
> >
> > On Fri, Feb 21, 2020 at 2:48 PM Hequn Cheng 
> > wrote:
> > Congratulations Jingsong! Well deserved.
> >
> > Best,
> > Hequn
> >
> > On Fri, Feb 21, 2020 at 2:42 PM Yang Wang 
> > wrote:
> > Congratulations！Jingsong. Well deserved.
> >
> >
> > Best,
> > Yang
> >
> > Zhijiang  于2020年2月21日周五 下午1:18写道：
> > Congrats Jingsong! Welcome on board!
> >
> > Best,
> > Zhijiang
> >
> > --
> > From:Zhenghua Gao 
> > Send Time:2020 Feb. 21 (Fri.) 12:49
> > To:godfrey he 
> > Cc:dev ; user 
> > Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
> >
> > Congrats Jingsong！
> >
> >
> > *Best Regards,*
> > *Zhenghua Gao*
> >
> >
> > On Fri, Feb 21, 2020 at 11:59 AM godfrey he 
> > wrote:
> > Congrats Jingsong! Well deserved.
> >
> > Best,
> > godfrey
> >
> > Jeff Zhang  于2020年2月21日周五 上午11:49写道：
> > Congratulations！Jingsong. You deserve it
> >
> > wenlong.lwl  于2020年2月21日周五 上午11:43写道：
> > Congrats Jingsong!
> >
> > On Fri, 21 Feb 2020 at 11:41, Dian Fu 
> > wrote:
> >
> > Congrats Jingsong!
> >
> > 在 2020年2月21日，上午11:39，Jark Wu  写道：
> >
> > Congratulations Jingsong! Well deserved.
> >
> > Best,
> > Jark
> >
> > On Fri, 21 Feb 2020 at 11:32, zoudan  wrote:
> >
> > Congratulations! Jingsong
> >
> >
> > Best,
> > Dan Zou
> >
> >
> >
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
> >
> >
> > --
> > Best, Jingsong Lee
> >
> >
> >
> >
> >
> >
>
>

[jira] [Created] (FLINK-16293) Document using plugins in Kubernetes

2020-02-26 Thread Niels Basjes (Jira)

Niels Basjes created FLINK-16293:


 Summary: Document using plugins in Kubernetes
 Key: FLINK-16293
 URL: https://issues.apache.org/jira/browse/FLINK-16293
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Kubernetes
Affects Versions: 1.10.0
Reporter: Niels Basjes


It took me some time to figure out how to enable plugins when running Flink on 
Kubernetes.
So I'm writing some documentation to save other people trying the same a lot of 
time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16292) Execute all end to end tests on AZP

2020-02-26 Thread Robert Metzger (Jira)

Robert Metzger created FLINK-16292:
--

 Summary: Execute all end to end tests on AZP
 Key: FLINK-16292
 URL: https://issues.apache.org/jira/browse/FLINK-16292
 Project: Flink
  Issue Type: Sub-task
  Components: Build System
Reporter: Robert Metzger


Ensure that we execute all end to end tests on AZP:
- Make sure that all the e2e tests referenced in the splits are also referenced 
in the "run nightly tests" script
- make sure the java e2e tests are executed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Batch Flink Job S3 write performance vs Spark

2020-02-26 Thread Arvid Heise

Exactly. We use the hadoop-fs as an indirection on top of that, but Spark
probably does the same.

On Wed, Feb 26, 2020 at 3:52 PM sri hari kali charan Tummala <
kali.tumm...@gmail.com> wrote:

> Thank you  (the two systems running on Java and using the same set of
> libraries), so from my understanding, Flink uses AWS SDK behind the scenes
> same as spark.
>
> On Wed, Feb 26, 2020 at 8:49 AM Arvid Heise  wrote:
>
>> Fair benchmarks are notoriously difficult to setup.
>>
>> Usually, it's easy to find a workload where one system shines and as its
>> vendor you report that. Then, the competitor benchmarks a different use
>> case where his system outperforms ours. In the end, customers are more
>> confused than before.
>>
>> You should do your own benchmarks for your own workloads. That is the
>> only reliable way.
>>
>> In the end, both systems use similar setups and improvements in one
>> system are often also incorporated into the other system with some delay,
>> such that there should be no ground-breaking differences between the two
>> systems running on Java and using the same set of libraries.
>> Of course, if one system has a very specific optimization for your use
>> case, that could be much faster.
>>
>>
>> On Mon, Feb 24, 2020 at 11:26 PM sri hari kali charan Tummala <
>> kali.tumm...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> have a question did anyone compared the performance of Flink batch job
>>> writing to s3 vs spark writing to s3?
>>>
>>> --
>>> Thanks & Regards
>>> Sri Tummala
>>>
>>>
>
> --
> Thanks & Regards
> Sri Tummala
>
>

Re: [DISCUSS] FLIP-76: Unaligned checkpoints

2020-02-26 Thread Zhijiang

Thanks for the further explanations, Yu!

1. The inflight buffer spilling process is indeed handled asynchronously. While 
the buffer is not finished spilling, it would not be recycled to reuse again.
Your understanding is right. I guess I misunderstood your previous concern of 
additional memory consumption from the perspective of buffer usage.
My point of no additional memory consumption is from the perspective of total 
network memory size which would not be increased as a result.

2. We treat the inflight buffers as input states which are equivalent 
with existing operator states, and try to make use of all the existing 
mechanisms for
state handle and assignment during recovery. So i guess for the local recovery 
it should be the similar case. I would think through whether it has some special
work to do around with local recovery, and then clarify it in FLIP after we 
reach an agreement internally. BTW, this FLIP has not finalized yet.

3. Yes, the previous proposal is for measuring how many inflight buffers to be 
spilled which refers to the data size if really taking this way. I think the 
proposed option
in FLIP are the initial thoughts for various of possibilities. Which way we 
decide to take for the first version, I guess we need to further finalize 
before voting.

4. I think there probably exists the requirements or scenarios from users as 
you mentioned. Actually we have not finalized the way of switching to unaligned 
checkpoint yet.
Anyway we could provide an option for users to try out this feature at the 
beginning, although it might be not the most ideal one. Another input is that 
we know the motivation
of unaligned checkpoint is from the scenarios of backpressure, but it might 
also performs well in the case of non backpressure, even shorten the checkpoint 
duration without
obvious performance regression in our previous POC testing. So the backpressure 
might not be the only factor to switch to the unaligned way in practice I 
guess. Anyway your
inputs are valuable for us to make the final decision.

Best,
Zhijiang




--
From:Yu Li 
Send Time:2020 Feb. 26 (Wed.) 15:59
To:dev ; Zhijiang 
Subject:Re: [DISCUSS] FLIP-76: Unaligned checkpoints

Hi Zhijiang,

Thanks for the quick reply!

For the 1st question, please allow me to confirm, that when doing asynchronous 
checkpointing, disk spilling should happen in background in parallel with 
receiving/sending new data, or else it would become synchronous, right? Based 
on such assumption, some copy-on-write like mechanism would be necessary to 
make sure the new updates won't modify the to-be-checkpointed data, and this is 
where the additional memory consumption comes from.

About point #2, I suggest we write it down in the FLIP document about local 
recovery support (if reach a consensus here), to make sure it won't be 
neglected in later implementation (I believe there're still some work to do 
following existing local recovery mechanism). What do you think?

For the 3rd topic, do you mean UNALIGNED_WITH_MAX_INFLIGHT_DATA would set some 
kind of threshold about "how much in-flight data to checkpoint"? If so, could 
you further clarify the measurement (data size? record number? others?) since 
there seems to be no description in the current FLIP doc? This is somewhat 
different from my understanding after reading the FLIP...

Regarding question #4, I have no doubt that the new unaligned checkpoint 
mechanism could make fast checkpoint possible, at the cost of more memory, 
network bandwidth and disk space consumption. However, (correct me if I'm 
wrong) for users who are satisfied with the existing aligned checkpoint 
interval, taking the constant cost to prevent delayed checkpoint during back 
pressure - a relatively low frequency event - may not be that pragmatic.

Best Regards,
Yu

On Wed, 26 Feb 2020 at 15:07, Zhijiang  
wrote:
Hi Yu,

 Thanks for concerning of this FLIP and sharing your thoughts! Let me try to 
answer some below questions.

 1. Yes, the asynchronous checkpointing should be part of whole process and be 
supported naturally. As for the network memory concern, 
 the inflight-buffers would be spilled into persistent storage while triggering 
checkpoint, and are recycled to receive/send data after finish spilling.
 We still reuse the current network memory setting, so the maximum 
inflight-buffers would not exceed that amount, and there would not have
  additional memory consumption.

 2. Yes, we would try to reuse the existing checkpoint recovery mechanism for 
simple implementation.

 3. UNALIGNED_WITH_MAX_INFLIGHT_DATA and UNALIGNED_WITH_UNLIMITED_INFLIGHT_DATA 
are for the consideration of triggering checkpoint
 at proper time, the tradeoff between checkpoint duration and spilling inflight 
data, etc. I guess it still makes sense for the single input channel.
  Assuming there were already accumulated 100 unconsumed buffers in one remote 
input channel when

Re: [VOTE] FLIP-100: Add Attempt Information

2020-02-26 Thread Till Rohrmann

Fair enough. If this should become a problem we could introduce it later as
well.

What about changing the SubtasksTimeInfo response type into an array of
SubtaskTimeInfo? At the moment SubtasksTimeInfo contains a
flattened SubtaskTimeInfo and and array of SubtaskTimeInfo for the previous
attempts.

Cheers,
Till

On Wed, Feb 26, 2020 at 1:16 PM Yadong Xie  wrote:

> Hi Till
>
> Thanks for your comments.
>
> > I have a comment concerning the SubtasksTimesHandler
>
> It would be much easier for the frontend to handle a large amount of data
> if we have a rest API parameter filter, but in my opinion, the attempt list
> data is not large enough that we have to rely on the rest API parameters
> paging, we still can handle them all in the frontend.
>
> Users can filter the attempt list by the status(scheduled/created/deploying
> and so on) and other keywords(attempt_id and so on) directly in the
> frontend since all data are listed from the rest API.
> If we move some of the filter parameters to the rest API path parameter,
> all the other filter parameters need to be moved too.
>
> I suggest adding an attempt id filter in the UI to help users filter the
> desired attempt, and all the filtering process is running inside the
> browser side, what do you think about this?
>
>
>
>
> Till Rohrmann  于2020年2月25日周二 下午11:40写道：
>
> > Hi Yadong,
> >
> > thanks for creating this FLIP. I like the idea to make the web-ui
> > information richer wrt to subtask attempt information.
> >
> > I have a comment concerning the SubtasksTimesHandler: Should we change
> the
> > response type SubtasksTimeInfo so that it simply contains an
> > array of SubtaskTimeInfo? One could add an attempt range path parameter
> to
> > the SubtasksTimesHandler to be able to control which attempts will be
> > returned.
> >
> > Cheers,
> > Till
> >
> > On Tue, Feb 25, 2020 at 9:57 AM Benchao Li  wrote:
> >
> > > Hi Yadong,
> > >
> > > Thanks for the updating.  LGTM now.
> > >
> > > +1 (non-binding)
> > >
> > > Yadong Xie  于2020年2月25日周二 下午4:41写道：
> > >
> > > > Hi Kurt
> > > >
> > > > There will be no differences between batch jobs and stream jobs in
> > > > subtask-attempt level in the UI
> > > > The only differences are in the vertex timeline, I have added a
> > > screenshot
> > > > of the batch job in the FLIP-100 since the batch job will disappear
> > from
> > > > the list after it finished soon.
> > > > here is the link:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-100%3A+Add+Attempt+Information
> > > >
> > > >
> > > > Kurt Young  于2020年2月21日周五 上午11:51写道：
> > > >
> > > > > Hi Yadong,
> > > > >
> > > > > Thanks for the proposal, it's a useful feature, especially for
> batch
> > > > jobs.
> > > > > But according
> > > > > to the examples you gave, I can't tell whether i got required
> > > information
> > > > > from that.
> > > > > Can you replace the demo job to a more complex batch job and then
> we
> > > can
> > > > > see some
> > > > > differences of start/stop time of different tasks and attempts?
> > > > >
> > > > > Best,
> > > > > Kurt
> > > > >
> > > > >
> > > > > On Thu, Feb 20, 2020 at 5:46 PM Yadong Xie 
> > > wrote:
> > > > >
> > > > > > Hi all
> > > > > >
> > > > > > I want to start the vote for FLIP-100, which proposes to add
> > attempt
> > > > > > information inside subtask and timeline in web UI.
> > > > > >
> > > > > > To help everyone better understand the proposal, we spent some
> > > efforts
> > > > on
> > > > > > making an online POC
> > > > > >
> > > > > > Timeline Attempt (click the vertex timeline to see the
> > differences):
> > > > > > previous web:
> > > > > >
> > > > >
> > > >
> > >
> >
> http://101.132.122.69:8081/#/job/9d651769488466d33e7a607e85203543/timeline
> > > > > > POC web:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://101.132.122.69:8081/web/#/job/9d651769488466d33e7a607e85203543/timeline
> > > > > >
> > > > > > Subtask Attempt (click the vertex and switch to subtask tab to
> see
> > > the
> > > > > > differences):
> > > > > > previous web:
> > > > > >
> > > > >
> > > >
> > >
> >
> http://101.132.122.69:8081/#/job/9d651769488466d33e7a607e85203543/overview
> > > > > > POC web:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://101.132.122.69:8081/web/#/job/9d651769488466d33e7a607e85203543/overview
> > > > > >
> > > > > >
> > > > > > The vote will last for at least 72 hours, following the consensus
> > > > voting
> > > > > > process.
> > > > > >
> > > > > > FLIP wiki:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-100%3A+Add+Attempt+Information
> > > > > >
> > > > > > Discussion thread:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Yadong
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > >

Re: Batch Flink Job S3 write performance vs Spark

2020-02-26 Thread Arvid Heise

Fair benchmarks are notoriously difficult to setup.

Usually, it's easy to find a workload where one system shines and as its
vendor you report that. Then, the competitor benchmarks a different use
case where his system outperforms ours. In the end, customers are more
confused than before.

You should do your own benchmarks for your own workloads. That is the only
reliable way.

In the end, both systems use similar setups and improvements in one system
are often also incorporated into the other system with some delay, such
that there should be no ground-breaking differences between the two systems
running on Java and using the same set of libraries.
Of course, if one system has a very specific optimization for your use
case, that could be much faster.

On Mon, Feb 24, 2020 at 11:26 PM sri hari kali charan Tummala <
kali.tumm...@gmail.com> wrote:

> Hi All,
>
> have a question did anyone compared the performance of Flink batch job
> writing to s3 vs spark writing to s3?
>
> --
> Thanks & Regards
> Sri Tummala
>
>

Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API

2020-02-26 Thread Jark Wu

Hi Jingsong,

I don't think it should follow SQL CAST semantics, because it is out of
SQL, it happens in connectors which converts users'/external's format into
SQL types.
I also doubt "timestampFormat" may not work in some cases, because the
timestamp format maybe various and mixed in a topic.

Best,
Jark

On Wed, 26 Feb 2020 at 22:20, Jingsong Li  wrote:

> Thanks all for your discussion.
>
> Hi Dawid,
>
> +1 to apply the logic of parsing a SQL timestamp literal.
>
> I don't fully understand the matrix your list. Should this be the
> semantics of SQL cast?
> Do you mean this is implicit cast in JSON parser?
> I doubt that because these implicit casts are not support
> in LogicalTypeCasts. And it is not so good to understand when it occur
> silently.
>
> How about add "timestampFormat" property to JSON parser? Its default value
> is SQL timestamp literal format. And user can configure this.
>
> Best,
> Jingsong Lee
>
> On Wed, Feb 26, 2020 at 6:39 PM Jark Wu  wrote:
>
>> Hi Dawid,
>>
>> I agree with you. If we want to loosen the format constraint, the
>> important piece is the conversion matrix.
>>
>> The conversion matrix you listed makes sense to me. From my understanding,
>> there should be 6 combination.
>> We can add WITHOUT TIMEZONE => WITHOUT TIMEZONE and WITH TIMEZONE => WITH
>> TIMEZONE to make the matrix complete.
>> When the community reach an agreement on this, we should write it down on
>> the documentation and follow the matrix in all text-based formats.
>>
>> Regarding to the RFC 3339 compatibility mode switch, it also sounds good
>> to
>> me.
>>
>> Best,
>> Jark
>>
>> On Wed, 26 Feb 2020 at 17:44, Dawid Wysakowicz 
>> wrote:
>>
>> > Hi all,
>> >
>> > @NiYanchun Thank you for reporting this. Yes I think we could improve
>> the
>> > behaviour of the JSON format.
>> >
>> > @Jark First of all I do agree we could/should improve the
>> > "user-friendliness" of the JSON format (and unify the behavior across
>> text
>> > based formats). I am not sure though if it is as simple as just ignore
>> the
>> > time zone here.
>> >
>> > My suggestion would be rather to apply the logic of parsing a SQL
>> > timestamp literal (if the expected type is of
>> LogicalTypeFamily.TIMESTAMP),
>> > which would actually also derive the "stored" type of the timestamp
>> (either
>> > WITHOUT TIMEZONE or WITH TIMEZONE) and then apply a proper sql
>> conversion.
>> > Therefore if the
>> >
>> > parsed type |requested type|
>> behaviour
>> >
>> > WITHOUT TIMEZONE| WITH TIMEZONE | store the local
>> > timezone with the data
>> >
>> > WITHOUT TIMEZONE| WITH LOCAL TIMEZONE  | do nothing in the data,
>> > interpret the time in local timezone
>> >
>> > WITH TIMEZONE  | WITH LOCAL TIMEZONE   | convert the
>> timestamp
>> > to local timezone and drop the time zone information
>> >
>> > WITH TIMEZONE  | WITHOUT TIMEZONE   | drop the time zone
>> > information
>> >
>> > It might just boil down to what you said "being more lenient with
>> regards
>> > to parsing the time zone". Nevertheless I think this way it is a bit
>> better
>> > defined behaviour, especially as it has a defined behaviour when
>> converting
>> > between representation with or without time zone.
>> >
>> > An implementation note. I think we should aim to base the implementation
>> > on the DataTypes already rather than going back to the TypeInformation.
>> >
>> > I would still try to leave the RFC 3339 compatibility mode, but maybe
>> for
>> > that mode it would make sense to not support any types WITHOUT TIMEZONE?
>> > This would be enabled with a switch (disabled by default). As I
>> understand
>> > the RFC, making the time zone mandatory is actually a big part of the
>> > standard as it makes time types unambiguous.
>> >
>> > What do you think?
>> >
>> > Ps. I cross posted this on the dev ML.
>> >
>> > Best,
>> >
>> > Dawid
>> >
>> >
>> > On 26/02/2020 03:45, Jark Wu wrote:
>> >
>> > Yes, I'm also in favor of loosen the datetime format constraint.
>> > I guess most of the users don't know there is a JSON standard which
>> > follows RFC 3339.
>> >
>> > Best,
>> > Jark
>> >
>> > On Wed, 26 Feb 2020 at 10:06, NiYanchun  wrote:
>> >
>> >> Yes, these Types definition are general. As a user/developer, I would
>> >> support “loosen it for usability”. If not, may add some explanation
>> >> about JSON.
>> >>
>> >>
>> >>
>> >>  Original Message
>> >> *Sender:* Jark Wu
>> >> *Recipient:* Outlook; Dawid Wysakowicz<
>> >> dwysakow...@apache.org>
>> >> *Cc:* godfrey he; Leonard Xu;
>> >> user
>> >> *Date:* Wednesday, Feb 26, 2020 09:55
>> >> *Subject:* Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API
>> >>
>> >> Hi Outlook,
>> >>
>> >> The explanation in DataTypes is correct, it is compliant to SQL
>> standard.
>> >> The problem is that JsonRowDeserializationSchema only support
>> RFC-3339.
>> >> On the other hand, CsvRowDeserializationSchema supports to parse
>> >> "2019-07-09

[jira] [Created] (FLINK-16291) Count(*) doesn't work with Hive module

2020-02-26 Thread Rui Li (Jira)

Rui Li created FLINK-16291:
--

 Summary: Count(*) doesn't work with Hive module
 Key: FLINK-16291
 URL: https://issues.apache.org/jira/browse/FLINK-16291
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive, Table SQL / Planner
Reporter: Rui Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16290) HttpUrl might return NULL if a schema is missing

2020-02-26 Thread Igal Shilman (Jira)

Igal Shilman created FLINK-16290:


 Summary: HttpUrl might return NULL if a schema is missing
 Key: FLINK-16290
 URL: https://issues.apache.org/jira/browse/FLINK-16290
 Project: Flink
  Issue Type: Bug
  Components: Stateful Functions
Reporter: Igal Shilman
Assignee: Igal Shilman


okhttp's HttpUrl class might be null if the original endpoint definition
is missing the schema (or it is not http or https)
To prevent that, we need to validate the parsed URI in JsonModule.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API

2020-02-26 Thread Jingsong Li

Thanks all for your discussion.

Hi Dawid,

+1 to apply the logic of parsing a SQL timestamp literal.

I don't fully understand the matrix your list. Should this be the semantics
of SQL cast?
Do you mean this is implicit cast in JSON parser?
I doubt that because these implicit casts are not support
in LogicalTypeCasts. And it is not so good to understand when it occur
silently.

How about add "timestampFormat" property to JSON parser? Its default value
is SQL timestamp literal format. And user can configure this.

Best,
Jingsong Lee

On Wed, Feb 26, 2020 at 6:39 PM Jark Wu  wrote:

> Hi Dawid,
>
> I agree with you. If we want to loosen the format constraint, the
> important piece is the conversion matrix.
>
> The conversion matrix you listed makes sense to me. From my understanding,
> there should be 6 combination.
> We can add WITHOUT TIMEZONE => WITHOUT TIMEZONE and WITH TIMEZONE => WITH
> TIMEZONE to make the matrix complete.
> When the community reach an agreement on this, we should write it down on
> the documentation and follow the matrix in all text-based formats.
>
> Regarding to the RFC 3339 compatibility mode switch, it also sounds good to
> me.
>
> Best,
> Jark
>
> On Wed, 26 Feb 2020 at 17:44, Dawid Wysakowicz 
> wrote:
>
> > Hi all,
> >
> > @NiYanchun Thank you for reporting this. Yes I think we could improve the
> > behaviour of the JSON format.
> >
> > @Jark First of all I do agree we could/should improve the
> > "user-friendliness" of the JSON format (and unify the behavior across
> text
> > based formats). I am not sure though if it is as simple as just ignore
> the
> > time zone here.
> >
> > My suggestion would be rather to apply the logic of parsing a SQL
> > timestamp literal (if the expected type is of
> LogicalTypeFamily.TIMESTAMP),
> > which would actually also derive the "stored" type of the timestamp
> (either
> > WITHOUT TIMEZONE or WITH TIMEZONE) and then apply a proper sql
> conversion.
> > Therefore if the
> >
> > parsed type |requested type|
> behaviour
> >
> > WITHOUT TIMEZONE| WITH TIMEZONE | store the local
> > timezone with the data
> >
> > WITHOUT TIMEZONE| WITH LOCAL TIMEZONE  | do nothing in the data,
> > interpret the time in local timezone
> >
> > WITH TIMEZONE  | WITH LOCAL TIMEZONE   | convert the
> timestamp
> > to local timezone and drop the time zone information
> >
> > WITH TIMEZONE  | WITHOUT TIMEZONE   | drop the time zone
> > information
> >
> > It might just boil down to what you said "being more lenient with regards
> > to parsing the time zone". Nevertheless I think this way it is a bit
> better
> > defined behaviour, especially as it has a defined behaviour when
> converting
> > between representation with or without time zone.
> >
> > An implementation note. I think we should aim to base the implementation
> > on the DataTypes already rather than going back to the TypeInformation.
> >
> > I would still try to leave the RFC 3339 compatibility mode, but maybe for
> > that mode it would make sense to not support any types WITHOUT TIMEZONE?
> > This would be enabled with a switch (disabled by default). As I
> understand
> > the RFC, making the time zone mandatory is actually a big part of the
> > standard as it makes time types unambiguous.
> >
> > What do you think?
> >
> > Ps. I cross posted this on the dev ML.
> >
> > Best,
> >
> > Dawid
> >
> >
> > On 26/02/2020 03:45, Jark Wu wrote:
> >
> > Yes, I'm also in favor of loosen the datetime format constraint.
> > I guess most of the users don't know there is a JSON standard which
> > follows RFC 3339.
> >
> > Best,
> > Jark
> >
> > On Wed, 26 Feb 2020 at 10:06, NiYanchun  wrote:
> >
> >> Yes, these Types definition are general. As a user/developer, I would
> >> support “loosen it for usability”. If not, may add some explanation
> >> about JSON.
> >>
> >>
> >>
> >>  Original Message
> >> *Sender:* Jark Wu
> >> *Recipient:* Outlook; Dawid Wysakowicz<
> >> dwysakow...@apache.org>
> >> *Cc:* godfrey he; Leonard Xu;
> >> user
> >> *Date:* Wednesday, Feb 26, 2020 09:55
> >> *Subject:* Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API
> >>
> >> Hi Outlook,
> >>
> >> The explanation in DataTypes is correct, it is compliant to SQL
> standard.
> >> The problem is that JsonRowDeserializationSchema only support  RFC-3339.
> >> On the other hand, CsvRowDeserializationSchema supports to parse
> >> "2019-07-09 02:02:00.040".
> >>
> >> So the question is shall we insist on the RFC-3339 "standard"? Shall we
> >> loosen it for usability?
> >> What do you think @Dawid Wysakowicz  ?
> >>
> >> Best,
> >> Jark
> >>
> >> On Wed, 26 Feb 2020 at 09:29, Outlook  wrote:
> >>
> >>> Thanks Godfrey and Leonard, I tried your answers, result is OK.
> >>>
> >>>
> >>> BTW, I think if only accept such format for a long time, the  TIME and
> >>> TIMESTAMP methods' doc in `org.apache.flink.table.api.DataTypes` may be
> >>> better to update,
> >>>
> >>>

[jira] [Created] (FLINK-16289) Missing serialVersionUID blocks running in Kubernetes.

2020-02-26 Thread Niels Basjes (Jira)

Niels Basjes created FLINK-16289:


 Summary: Missing serialVersionUID blocks running in Kubernetes.
 Key: FLINK-16289
 URL: https://issues.apache.org/jira/browse/FLINK-16289
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Kubernetes, Table SQL / Runtime
Reporter: Niels Basjes


I have written a Flink 1.10 job that reads a file (using the S3 Presto client), 
applies an SQL statement on that (with 
[Yauaa|https://yauaa.basjes.nl/UDF-ApacheFlinkTable.html] as a UDF) and then 
tries to write it to ElasticSearch.

The problem is that when submit this into the native Kubernetes cluster I get 
this exception (full stack trace below):

{code:java}java.io.InvalidClassException: 
org.apache.flink.table.codegen.GeneratedAggregationsFunction; local class 
incompatible: stream classdesc serialVersionUID = 1538379512770243128, local 
class serialVersionUID = -5485442333060060467 {code}

According to [this stack overflow 
answer|https://stackoverflow.com/a/10378907/114196] page this error stems from 
the JVM automatically generating a serialVersionUID in case it is missing, 
which can be JDK/JRE version dependent.

On my local machine (Ubuntu 16.04 LTS) I use the openjdk-9-jdk.

Apparently Flink docker image uses JRE 1.8
{code}
KubernetesTaskExecutorRunner  - 

KubernetesTaskExecutorRunner  -  Starting Kubernetes TaskExecutor runner 
(Version: 1.10.0, Rev:aa4eb8f, Date:07.02.2020 @ 19:18:19 CET)
KubernetesTaskExecutorRunner  -  OS current user: root
KubernetesTaskExecutorRunner  -  Current Hadoop/Kerberos user: 
KubernetesTaskExecutorRunner  -  JVM: OpenJDK 64-Bit Server VM - Oracle 
Corporation - 1.8/25.242-b08
{code}

I have tried doing the same with JDK 1.8 on my machine but that still does not 
work (apperently there is still a too big a difference in the Java versions).

When I remove the "group by" (i.e. the Aggregation) from my SQL statement this 
passes (and right now fails on missing dependencies ... different problem).
 
{code:java}
2020-02-26 11:40:48,303 INFO org.apache.flink.runtime.taskmanager.Task - 
groupBy: (useragent, DeviceClass, AgentNameVersionMajor), window: 
(TumblingGroupWindow('w$, 'EventTime, 360.millis)), select: (useragent, 
DeviceClass, AgentNameVersionMajor, SUM(clicks) AS clicks, SUM(visitors) AS 
visitors, start('w$) AS w$start, end('w$) AS w$end, rowtime('w$) AS w$rowtime, 
proctime('w$) AS w$proctime) -> select: (w$start AS wStart, useragent, 
DeviceClass, AgentNameVersionMajor, clicks, visitors) -> to: Row -> Sink: 
Unnamed (11/32) (db5cb408a1b286e705a2e3e30ac8131e) switched from RUNNING to 
FAILED.
 org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot 
instantiate user function.
 at 
org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperatorFactory(StreamConfig.java:269)
 at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:115)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:433)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:461)
 at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
 at java.lang.Thread.run(Thread.java:748)
 Caused by: java.io.InvalidClassException: 
org.apache.flink.table.codegen.GeneratedAggregationsFunction; local class 
incompatible: stream classdesc serialVersionUID = 1538379512770243128, local 
class serialVersionUID = -5485442333060060467
 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)
 at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1940)
 at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1806)
 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2097)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2342)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2266)
 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2124)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2342)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2266)
 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2124)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2342)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2266)
 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2124)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
 at

Re: [VOTE] FLIP-100: Add Attempt Information

2020-02-26 Thread Yadong Xie

Hi Till

Thanks for your comments.

> I have a comment concerning the SubtasksTimesHandler

It would be much easier for the frontend to handle a large amount of data
if we have a rest API parameter filter, but in my opinion, the attempt list
data is not large enough that we have to rely on the rest API parameters
paging, we still can handle them all in the frontend.

Users can filter the attempt list by the status(scheduled/created/deploying
and so on) and other keywords(attempt_id and so on) directly in the
frontend since all data are listed from the rest API.
If we move some of the filter parameters to the rest API path parameter,
all the other filter parameters need to be moved too.

I suggest adding an attempt id filter in the UI to help users filter the
desired attempt, and all the filtering process is running inside the
browser side, what do you think about this?




Till Rohrmann  于2020年2月25日周二 下午11:40写道：

> Hi Yadong,
>
> thanks for creating this FLIP. I like the idea to make the web-ui
> information richer wrt to subtask attempt information.
>
> I have a comment concerning the SubtasksTimesHandler: Should we change the
> response type SubtasksTimeInfo so that it simply contains an
> array of SubtaskTimeInfo? One could add an attempt range path parameter to
> the SubtasksTimesHandler to be able to control which attempts will be
> returned.
>
> Cheers,
> Till
>
> On Tue, Feb 25, 2020 at 9:57 AM Benchao Li  wrote:
>
> > Hi Yadong,
> >
> > Thanks for the updating.  LGTM now.
> >
> > +1 (non-binding)
> >
> > Yadong Xie  于2020年2月25日周二 下午4:41写道：
> >
> > > Hi Kurt
> > >
> > > There will be no differences between batch jobs and stream jobs in
> > > subtask-attempt level in the UI
> > > The only differences are in the vertex timeline, I have added a
> > screenshot
> > > of the batch job in the FLIP-100 since the batch job will disappear
> from
> > > the list after it finished soon.
> > > here is the link:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-100%3A+Add+Attempt+Information
> > >
> > >
> > > Kurt Young  于2020年2月21日周五 上午11:51写道：
> > >
> > > > Hi Yadong,
> > > >
> > > > Thanks for the proposal, it's a useful feature, especially for batch
> > > jobs.
> > > > But according
> > > > to the examples you gave, I can't tell whether i got required
> > information
> > > > from that.
> > > > Can you replace the demo job to a more complex batch job and then we
> > can
> > > > see some
> > > > differences of start/stop time of different tasks and attempts?
> > > >
> > > > Best,
> > > > Kurt
> > > >
> > > >
> > > > On Thu, Feb 20, 2020 at 5:46 PM Yadong Xie 
> > wrote:
> > > >
> > > > > Hi all
> > > > >
> > > > > I want to start the vote for FLIP-100, which proposes to add
> attempt
> > > > > information inside subtask and timeline in web UI.
> > > > >
> > > > > To help everyone better understand the proposal, we spent some
> > efforts
> > > on
> > > > > making an online POC
> > > > >
> > > > > Timeline Attempt (click the vertex timeline to see the
> differences):
> > > > > previous web:
> > > > >
> > > >
> > >
> >
> http://101.132.122.69:8081/#/job/9d651769488466d33e7a607e85203543/timeline
> > > > > POC web:
> > > > >
> > > > >
> > > >
> > >
> >
> http://101.132.122.69:8081/web/#/job/9d651769488466d33e7a607e85203543/timeline
> > > > >
> > > > > Subtask Attempt (click the vertex and switch to subtask tab to see
> > the
> > > > > differences):
> > > > > previous web:
> > > > >
> > > >
> > >
> >
> http://101.132.122.69:8081/#/job/9d651769488466d33e7a607e85203543/overview
> > > > > POC web:
> > > > >
> > > > >
> > > >
> > >
> >
> http://101.132.122.69:8081/web/#/job/9d651769488466d33e7a607e85203543/overview
> > > > >
> > > > >
> > > > > The vote will last for at least 72 hours, following the consensus
> > > voting
> > > > > process.
> > > > >
> > > > > FLIP wiki:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-100%3A+Add+Attempt+Information
> > > > >
> > > > > Discussion thread:
> > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Yadong
> > > > >
> > > >
> > >
> >
> >
> > --
> >
> > Benchao Li
> > School of Electronics Engineering and Computer Science, Peking University
> > Tel:+86-15650713730
> > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> >
>

Re: [VOTE] FLIP-104: Add More Metrics to Jobmanager

2020-02-26 Thread lining jing

Hi till,
thanks for your reply.


> Concerning FLINK-9741, I'm not sure whether we need to fix this issue
> before starting this effort. The JobManager's are now running as part of
> the cluster entrypoint process for which we should actually report the
> metrics (memory usage).


I have confirmed it with Zhu Zhu offline, as now dispatcher still with
jobmanager, so it should not affect the accuracy of the metric.

Till Rohrmann  于2020年2月26日周三 上午12:04写道：

> Hi Yadong,
>
> thanks for creating this FLIP. I like the idea of exposing more
> cluster information to the user.
>
> I share Xintong's concerns that we are about to rework the cluster
> entrypoint's memory management. It might make sense to wait for these
> changes before starting this effort. Otherwise, we might risk to do some
> double work.
>
> Concerning FLINK-9741, I'm not sure whether we need to fix this issue
> before starting this effort. The JobManager's are now running as part of
> the cluster entrypoint process for which we should actually report the
> metrics (memory usage).
>
> Cheers,
> Till
>
> On Tue, Feb 25, 2020 at 10:52 AM Jark Wu  wrote:
>
> > Thanks Xintong for the explanation.
> >
> > The FLIP looks good to me now. +1 from my side.
> >
> > Best,
> > Jark
> >
> > On Tue, 25 Feb 2020 at 15:46, Xintong Song 
> wrote:
> >
> > > @Jark
> > >
> > > First, let me try to clarify that, while this FLIP is about adding JM
> > > metrics, the discussion of having different colors distinguishing the
> > > memory usage applies for both JM and TM.
> > >
> > > IMO, I don't think there's a good way to define how should memory
> > > utilization be mapped to colors in general.
> > >
> > >- Direct memory
> > >   - JM: ATM, we do not specify -XX:MaxDirectMemorySize.
> > >   - TM: Direct memory consists of network memory and framework/task
> > >   off-heap memory, the former should always be 100% while the
> latter
> > may not.
> > >   Therefore, the utilization of direct memory really depends on the
> > >   configured size of network memory and framework/task off-heap
> > memory.
> > >- Heap memory: We might observe that the memory usage keeps growing
> > >until GC is triggered, thus eventually the utilization might
> > fluctuates at
> > >somewhere close to 100%.
> > >
> > > In general, a low memory utilization probably suggests that the memory
> > > size is configured too large, but a high memory utilization does not
> > > necessarily suggest the configured memory size need to be increased,
> > thus,
> > > not sure about rendering it in red.
> > >
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Tue, Feb 25, 2020 at 3:13 PM Yadong Xie 
> wrote:
> > >
> > >> Hi all
> > >> we have updated the POC web, and added unit to GC metrics
> > >> check it here http://101.132.122.69:8081/web/#/job-manager/metrics
> > >> thanks for all the response
> > >>
> > >> Jark Wu  于2020年2月24日周一 下午8:48写道：
> > >>
> > >>> Hi Yadong,
> > >>>
> > >>> > what is the boundary between red and green?
> > >>> Yes. I think that's the point we need to discuss. My gut feeling is
> > >>> "<60%"
> > >>> => green, "60%~80%" => yellow, ">80%" => red.
> > >>> But I guess directed memory is always 100%, so it is not suitable for
> > >>> that?
> > >>> Maybe @Xintong Song  has a better
> understanding
> > >>> on
> > >>> the memory threshold.
> > >>>
> > >>> Best,
> > >>> Jark
> > >>>
> > >>> On Mon, 24 Feb 2020 at 15:41, Yadong Xie 
> wrote:
> > >>>
> > >>> > Hi Jark
> > >>> > thanks for your suggestion
> > >>> >
> > >>> > > I think we can use different color to distinguish the memory
> usage
> > >>> (from
> > >>> > green to red?).
> > >>> >
> > >>> > It is a good idea, but what is the boundary between red and green?
> > >>> giving a
> > >>> > magic number boundary may mislead the users. any suggestions?
> > >>> >
> > >>> > > Besides, I think we should add an unit on the "Garbage
> Collection"
> > ->
> > >>> > "Time", it's hard to know what the value mean. Would be better to
> > >>> display
> > >>> > the value like "10ms", "5ns".
> > >>> >
> > >>> > I will add the unit later, thanks for your advice.
> > >>> >
> > >>> >
> > >>> > Xintong Song  于2020年2月21日周五 下午6:02写道：
> > >>> >
> > >>> > > FYI, there's an effort planned for 1.11 to improve the memory
> > >>> > configuration
> > >>> > > of the Flink master process, similar to FLIP-49 but definitely
> less
> > >>> > > complexity.
> > >>> > >
> > >>> > > I would not consider the memory configuration improvement as a
> > >>> blocker
> > >>> > for
> > >>> > > this effort. As far as I can see, there's nothing in conflict.
> Just
> > >>> after
> > >>> > > the memory configuration improvement, we might be able to present
> > >>> more
> > >>> > > information on the JM metrics page, which are tightly
> corresponding
> > >>> to
> > >>> > the
> > >>> > > configuration options, like what we planned for the TM metrics
> page
> > >>> in
> > >>> > > FLIP-102. Therefore, it might make sense to proceed

[jira] [Created] (FLINK-16288) Setting the TTL for discarding task pods on Kubernetes.

2020-02-26 Thread Niels Basjes (Jira)

Niels Basjes created FLINK-16288:


 Summary: Setting the TTL for discarding task pods on Kubernetes.
 Key: FLINK-16288
 URL: https://issues.apache.org/jira/browse/FLINK-16288
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Kubernetes
Affects Versions: 1.10.0
Reporter: Niels Basjes


I'm experimenting with running Flink 1.10.0 on native Kubernetes (version 1.17).

After a job ends the task pods that were used to run it are discarded quite 
quickly.

I found that if my job goes wrong I have too little time to look at all of the 
logs.

I propose having a new config setting that allows me to run Flink on k8s where 
I can set the minimum time before an idle task pod is discarded.

That way I can start Flink with a pod ttl of an hour (or something like that) 
so I have enough time to go through the logs and figure out what I did wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API

2020-02-26 Thread Jark Wu

Hi Dawid,

I agree with you. If we want to loosen the format constraint, the
important piece is the conversion matrix.

The conversion matrix you listed makes sense to me. From my understanding,
there should be 6 combination.
We can add WITHOUT TIMEZONE => WITHOUT TIMEZONE and WITH TIMEZONE => WITH
TIMEZONE to make the matrix complete.
When the community reach an agreement on this, we should write it down on
the documentation and follow the matrix in all text-based formats.

Regarding to the RFC 3339 compatibility mode switch, it also sounds good to
me.

Best,
Jark

On Wed, 26 Feb 2020 at 17:44, Dawid Wysakowicz 
wrote:

> Hi all,
>
> @NiYanchun Thank you for reporting this. Yes I think we could improve the
> behaviour of the JSON format.
>
> @Jark First of all I do agree we could/should improve the
> "user-friendliness" of the JSON format (and unify the behavior across text
> based formats). I am not sure though if it is as simple as just ignore the
> time zone here.
>
> My suggestion would be rather to apply the logic of parsing a SQL
> timestamp literal (if the expected type is of LogicalTypeFamily.TIMESTAMP),
> which would actually also derive the "stored" type of the timestamp (either
> WITHOUT TIMEZONE or WITH TIMEZONE) and then apply a proper sql conversion.
> Therefore if the
>
> parsed type |requested type| behaviour
>
> WITHOUT TIMEZONE| WITH TIMEZONE | store the local
> timezone with the data
>
> WITHOUT TIMEZONE| WITH LOCAL TIMEZONE  | do nothing in the data,
> interpret the time in local timezone
>
> WITH TIMEZONE  | WITH LOCAL TIMEZONE   | convert the timestamp
> to local timezone and drop the time zone information
>
> WITH TIMEZONE  | WITHOUT TIMEZONE   | drop the time zone
> information
>
> It might just boil down to what you said "being more lenient with regards
> to parsing the time zone". Nevertheless I think this way it is a bit better
> defined behaviour, especially as it has a defined behaviour when converting
> between representation with or without time zone.
>
> An implementation note. I think we should aim to base the implementation
> on the DataTypes already rather than going back to the TypeInformation.
>
> I would still try to leave the RFC 3339 compatibility mode, but maybe for
> that mode it would make sense to not support any types WITHOUT TIMEZONE?
> This would be enabled with a switch (disabled by default). As I understand
> the RFC, making the time zone mandatory is actually a big part of the
> standard as it makes time types unambiguous.
>
> What do you think?
>
> Ps. I cross posted this on the dev ML.
>
> Best,
>
> Dawid
>
>
> On 26/02/2020 03:45, Jark Wu wrote:
>
> Yes, I'm also in favor of loosen the datetime format constraint.
> I guess most of the users don't know there is a JSON standard which
> follows RFC 3339.
>
> Best,
> Jark
>
> On Wed, 26 Feb 2020 at 10:06, NiYanchun  wrote:
>
>> Yes, these Types definition are general. As a user/developer, I would
>> support “loosen it for usability”. If not, may add some explanation
>> about JSON.
>>
>>
>>
>>  Original Message
>> *Sender:* Jark Wu
>> *Recipient:* Outlook; Dawid Wysakowicz<
>> dwysakow...@apache.org>
>> *Cc:* godfrey he; Leonard Xu;
>> user
>> *Date:* Wednesday, Feb 26, 2020 09:55
>> *Subject:* Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API
>>
>> Hi Outlook,
>>
>> The explanation in DataTypes is correct, it is compliant to SQL standard.
>> The problem is that JsonRowDeserializationSchema only support  RFC-3339.
>> On the other hand, CsvRowDeserializationSchema supports to parse
>> "2019-07-09 02:02:00.040".
>>
>> So the question is shall we insist on the RFC-3339 "standard"? Shall we
>> loosen it for usability?
>> What do you think @Dawid Wysakowicz  ?
>>
>> Best,
>> Jark
>>
>> On Wed, 26 Feb 2020 at 09:29, Outlook  wrote:
>>
>>> Thanks Godfrey and Leonard, I tried your answers, result is OK.
>>>
>>>
>>> BTW, I think if only accept such format for a long time, the  TIME and
>>> TIMESTAMP methods' doc in `org.apache.flink.table.api.DataTypes` may be
>>> better to update,
>>>
>>> because the document now is not what the method really support. For
>>> example,
>>>
>>>
>>> ```
>>> /**
>>> * Data type of a time WITHOUT time zone {@code TIME} with no fractional
>>> seconds by default.
>>> *
>>> * An instance consists of {@code hour:minute:second} with up to
>>> second precision
>>> * and values ranging from {@code 00:00:00} to {@code 23:59:59}.
>>> *
>>> * Compared to the SQL standard, leap seconds (23:59:60 and 23:59:61)
>>> are not supported as the
>>> * semantics are closer to {@link java.time.LocalTime}. A time WITH time
>>> zone is not provided.
>>> *
>>> * @see #TIME(int)
>>> * @see TimeType
>>> */
>>> public static DataType TIME() {
>>> return new AtomicDataType(new TimeType());
>>>
>>> }```
>>>
>>>
>>> Thanks again.
>>>
>>>  Original Message
>>> *Sender:* Leonard Xu
>>> *Recipient:* godfrey he
>>> *Cc:* Outlook;

Re: [VOTE] FLIP-102: Add More Metrics to TaskManager

2020-02-26 Thread Yadong Xie

Hi Till

Thanks a lot for your response

> 2. I'm not entirely sure whether I would split the memory ...

Split the memory display comes from the 'ancient' design of the web, it is
ok for me to change it following total/heap/managed/network/direct/jvm
overhead/mapped sequence

> 3. Displaying the memory configurations...

I agree with you that it is not a very nice way, but the hierarchical
relationship of configurations is too complex and hard to display in the
other ways (I have tried)

if anyone has a better idea, please feels no hesitates to help me


> 4. What does JVM limit mean in Non-heap.JVM-Overhead?

JVM limit is "non-heap max metric minus metaspace configuration" as @Xintong
Song  replyed in this mail thread


Till Rohrmann  于2020年2月25日周二 下午6:58写道：

> Thanks for creating this FLIP Yadong. I think your proposal makes it much
> easier for the user to understand what's happening on Flink TaskManager's.
>
> I have some comments:
>
> 1. Some of the newly introduced metrics involve computations on the
> TaskManager. I would like to avoid additional computations introduced by
> metrics as much as possible because metrics should not affect the system.
> In particular, total memory sizes which are configured should not be
> derived computationally (getManagedMemoryTotal, getTotalMemorySize). For
> the currently available memory sizes (e.g. getManagedMemoryUsed), one could
> think about reporting them on a per slot basis and to do the aggregation on
> the client side. Of course, this would increase the size of the response
> payload.
>
> 2. I'm not entirely sure whether I would split the memory display into JVM
> memory and non JVM memory as you've done it int the POC. From a user's
> perspective, one could start displaying the total process memory. The next
> three most important metrics are the heap, managed memory and network
> buffer usage, I guess. If one is interested in more details, one could then
> display the remaining direct memory usage, the JVM overhead (I'm not sure
> whether I would call this non-heap though) and the mapped memory.
>
> 3. Displaying the memory configurations in three nested boxes does not look
> so nice to me. I'm not sure how else one could display it, though.
>
> 4. What does JVM limit mean in Non-heap.JVM-Overhead?
>
> Cheers,
> Till
>
> On Tue, Feb 25, 2020 at 8:19 AM Yadong Xie  wrote:
>
> > Hi Xintong
> > thanks for your advice, the POC web and the FLIP doc was updated now
> > here is the new link:
> >
> >
> http://101.132.122.69:8081/web/#/task-manager/7e7cf0293645c8537caab915c829aa73/metrics
> >
> >
> > Xintong Song  于2020年2月21日周五 下午12:00写道：
> >
> > > >
> > > > 1. Should the managed memory be part of direct memory?
> > > >
> > > The answer is no. Managed memory is currently allocated by accessing to
> > > private field of Unsafe. It is not accounted for in JVM's direct memory
> > > limit and corresponding metrics. To that end, it is equivalent to
> > > native memory.
> > >
> > >
> > > > 2. Should the shuffle memory also be part of the managed memory?
> > >
> > > I don't think so. Shuffle (Network) memory is allocated with direct
> > > buffers, and accounted for in JVM's direct memory limit and
> corresponding
> > > metrics. Moreover, the FLIP-49 memory model expose network memory and
> > > managed memory as two independent components of the overall memory
> > > footprint.
> > >
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Fri, Feb 21, 2020 at 11:45 AM Kurt Young  wrote:
> > >
> > > > Some questions related to "managed memory":
> > > >
> > > > 1. Should the managed memory be part of direct memory?
> > > > 2. Should the shuffle memory also be part of the managed memory?
> > > >
> > > > Best,
> > > > Kurt
> > > >
> > > >
> > > > On Fri, Feb 21, 2020 at 10:41 AM Xintong Song  >
> > > > wrote:
> > > >
> > > > > Thanks for driving this FLIP, Yadong.
> > > > >
> > > > > +1 (non-binding) for the FLIP in general. I think this really helps
> > our
> > > > > users to understand and use the new FLIP-49 memory configuration.
> > > > >
> > > > > I have a few minor comments.
> > > > > - There's a frame "Other" in the frame "Non-Heap", besides "JVM
> > > Overhead"
> > > > > and "JVM Metaspace". IIUC, the purpose of this is to explain the
> > > > > mismatching between the metric "non-heap maximum" and the sum of
> the
> > > > > configurations "JVM metaspace" & "JVM Overhead". However, from the
> > > > > perspective of FLIP-49, JVM Overhead accounts for all the JVM
> > non-heap
> > > > > memory usages except for metaspace. The metrics does not match the
> > > > > configuration because we did not set the a JVM parameter for "max
> > > > non-heap
> > > > > memory" (actually I'm not sure whether it can be specified in java
> > 8).
> > > > The
> > > > > current UI might confuse people making them think there are other
> > > > non-heap
> > > > > memory usages not accounted by the configurations. Therefore, I
> would
> > > > > suggest to remove the "Other"

[jira] [Created] (FLINK-16287) ES6 sql jar relocates log4j2

2020-02-26 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-16287:


 Summary: ES6 sql jar relocates log4j2
 Key: FLINK-16287
 URL: https://issues.apache.org/jira/browse/FLINK-16287
 Project: Flink
  Issue Type: Bug
  Components: Build System, Connectors / ElasticSearch
Affects Versions: 1.11.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.11.0


{{flink-sql-connector-elasticsearch6}} still defines a relocation rule for 
log4j2, but this dependency is no longer bundled and instead provided by 
flink-dist.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] FLIP-102: Add More Metrics to TaskManager

2020-02-26 Thread lining jing

Hi Till
Thanks for your response!

I'm responsible for the RestAPI design part of FLIP-102

1. Some of the newly introduced metrics involve computations on the
> TaskManager. I would like to avoid additional computations introduced by
> metrics as much as possible because metrics should not affect the system.
> In particular, total memory sizes which are configured should not be
> derived computationally (getManagedMemoryTotal, getTotalMemorySize). For
> the currently available memory sizes (e.g. getManagedMemoryUsed), one could
> think about reporting them on a per slot basis and to do the aggregation on
> the client side. Of course, this would increase the size of the response
> payload.


I totally agree with your comment, but I still have a question: where
should the metric of slot's ManagedMemory be registered?

There are two ways to achieve this:

   1. add SlotMetricGroup
   2. register it in TaskManagerMetricGroup, such as 0.Managed.Memory.Used
(ps: 0 as the index of a slot).

Which way do you think is better? Looking forward to your replay.

Till Rohrmann  于2020年2月25日周二 下午6:58写道：

> Thanks for creating this FLIP Yadong. I think your proposal makes it much
> easier for the user to understand what's happening on Flink TaskManager's.
>
> I have some comments:
>
> 1. Some of the newly introduced metrics involve computations on the
> TaskManager. I would like to avoid additional computations introduced by
> metrics as much as possible because metrics should not affect the system.
> In particular, total memory sizes which are configured should not be
> derived computationally (getManagedMemoryTotal, getTotalMemorySize). For
> the currently available memory sizes (e.g. getManagedMemoryUsed), one could
> think about reporting them on a per slot basis and to do the aggregation on
> the client side. Of course, this would increase the size of the response
> payload.
>
> 2. I'm not entirely sure whether I would split the memory display into JVM
> memory and non JVM memory as you've done it int the POC. From a user's
> perspective, one could start displaying the total process memory. The next
> three most important metrics are the heap, managed memory and network
> buffer usage, I guess. If one is interested in more details, one could then
> display the remaining direct memory usage, the JVM overhead (I'm not sure
> whether I would call this non-heap though) and the mapped memory.
>
> 3. Displaying the memory configurations in three nested boxes does not look
> so nice to me. I'm not sure how else one could display it, though.
>
> 4. What does JVM limit mean in Non-heap.JVM-Overhead?
>
> Cheers,
> Till
>
> On Tue, Feb 25, 2020 at 8:19 AM Yadong Xie  wrote:
>
> > Hi Xintong
> > thanks for your advice, the POC web and the FLIP doc was updated now
> > here is the new link:
> >
> >
> http://101.132.122.69:8081/web/#/task-manager/7e7cf0293645c8537caab915c829aa73/metrics
> >
> >
> > Xintong Song  于2020年2月21日周五 下午12:00写道：
> >
> > > >
> > > > 1. Should the managed memory be part of direct memory?
> > > >
> > > The answer is no. Managed memory is currently allocated by accessing to
> > > private field of Unsafe. It is not accounted for in JVM's direct memory
> > > limit and corresponding metrics. To that end, it is equivalent to
> > > native memory.
> > >
> > >
> > > > 2. Should the shuffle memory also be part of the managed memory?
> > >
> > > I don't think so. Shuffle (Network) memory is allocated with direct
> > > buffers, and accounted for in JVM's direct memory limit and
> corresponding
> > > metrics. Moreover, the FLIP-49 memory model expose network memory and
> > > managed memory as two independent components of the overall memory
> > > footprint.
> > >
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Fri, Feb 21, 2020 at 11:45 AM Kurt Young  wrote:
> > >
> > > > Some questions related to "managed memory":
> > > >
> > > > 1. Should the managed memory be part of direct memory?
> > > > 2. Should the shuffle memory also be part of the managed memory?
> > > >
> > > > Best,
> > > > Kurt
> > > >
> > > >
> > > > On Fri, Feb 21, 2020 at 10:41 AM Xintong Song  >
> > > > wrote:
> > > >
> > > > > Thanks for driving this FLIP, Yadong.
> > > > >
> > > > > +1 (non-binding) for the FLIP in general. I think this really helps
> > our
> > > > > users to understand and use the new FLIP-49 memory configuration.
> > > > >
> > > > > I have a few minor comments.
> > > > > - There's a frame "Other" in the frame "Non-Heap", besides "JVM
> > > Overhead"
> > > > > and "JVM Metaspace". IIUC, the purpose of this is to explain the
> > > > > mismatching between the metric "non-heap maximum" and the sum of
> the
> > > > > configurations "JVM metaspace" & "JVM Overhead". However, from the
> > > > > perspective of FLIP-49, JVM Overhead accounts for all the JVM
> > non-heap
> > > > > memory usages except for metaspace. The metrics does not match the
> > > > > configuration because we did not set the a JVM parameter for

Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API

2020-02-26 Thread Dawid Wysakowicz

Hi all,

@NiYanchun Thank you for reporting this. Yes I think we could improve
the behaviour of the JSON format.

@Jark First of all I do agree we could/should improve the
"user-friendliness" of the JSON format (and unify the behavior across
text based formats). I am not sure though if it is as simple as just
ignore the time zone here.

My suggestion would be rather to apply the logic of parsing a SQL
timestamp literal (if the expected type is of
LogicalTypeFamily.TIMESTAMP), which would actually also derive the
"stored" type of the timestamp (either WITHOUT TIMEZONE or WITH
TIMEZONE) and then apply a proper sql conversion. Therefore if the

parsed type |    requested type        | behaviour

WITHOUT TIMEZONE    | WITH TIMEZONE | store the local
timezone with the data

WITHOUT TIMEZONE    | WITH LOCAL TIMEZONE  | do nothing in the data,
interpret the time in local timezone

WITH TIMEZONE  | WITH LOCAL TIMEZONE   | convert the
timestamp to local timezone and drop the time zone information

WITH TIMEZONE          | WITHOUT TIMEZONE   | drop the time zone
information 

It might just boil down to what you said "being more lenient with
regards to parsing the time zone". Nevertheless I think this way it is a
bit better defined behaviour, especially as it has a defined behaviour
when converting between representation with or without time zone.

An implementation note. I think we should aim to base the implementation
on the DataTypes already rather than going back to the TypeInformation.

I would still try to leave the RFC 3339 compatibility mode, but maybe
for that mode it would make sense to not support any types WITHOUT
TIMEZONE? This would be enabled with a switch (disabled by default). As
I understand the RFC, making the time zone mandatory is actually a big
part of the standard as it makes time types unambiguous.

What do you think?

Ps. I cross posted this on the dev ML.

Best,

Dawid

On 26/02/2020 03:45, Jark Wu wrote:
> Yes, I'm also in favor of loosen the datetime format constraint. 
> I guess most of the users don't know there is a JSON standard which
> follows RFC 3339.
>
> Best,
> Jark
>
> On Wed, 26 Feb 2020 at 10:06, NiYanchun  > wrote:
>
> Yes, these Types definition are general. As a user/developer, I
> would support “loosen it for usability”. If not, may add
> some explanation about JSON.
>
>
>
>  Original Message 
> *Sender:* Jark Wumailto:imj...@gmail.com>>
> *Recipient:* Outlook >; Dawid
> Wysakowiczmailto:dwysakow...@apache.org>>
> *Cc:* godfrey he >; Leonard Xu >; user >
> *Date:* Wednesday, Feb 26, 2020 09:55
> *Subject:* Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API
>
> Hi Outlook,
>
> The explanation in DataTypes is correct, it is compliant to SQL
> standard. The problem is that JsonRowDeserializationSchema only
> support  RFC-3339. 
> On the other hand, CsvRowDeserializationSchema supports to parse
> "2019-07-09 02:02:00.040".
>
> So the question is shall we insist on the RFC-3339 "standard"?
> Shall we loosen it for usability? 
> What do you think @Dawid Wysakowicz  ?
>
> Best,
> Jark
>
> On Wed, 26 Feb 2020 at 09:29, Outlook  > wrote:
>
> Thanks Godfrey and Leonard, I tried your answers, result is OK. 
>
>
> BTW, I think if only accept such format for a long time, the
>  TIME and TIMESTAMP methods' doc in
> `org.apache.flink.table.api.DataTypes` may be better to update,
>
> because the document now is not what the method really
> support. For example, 
>
>
> ```
>
> /**
> * Data type of a time WITHOUT time zone {@code TIME} with no
> fractional seconds by default.
> *
> * An instance consists of {@code hour:minute:second} with
> up to second precision
> * and values ranging from {@code 00:00:00} to {@code 23:59:59}.
> *
> * Compared to the SQL standard, leap seconds (23:59:60 and
> 23:59:61) are not supported as the
> * semantics are closer to {@link java.time.LocalTime}. A time
> WITH time zone is not provided.
> *
> * @see #TIME(int)
> * @see TimeType
> */
> public static DataType TIME() {
> return new AtomicDataType(new TimeType());
>
> }```
>
>
> Thanks again.
>
>
>  Original Message 
> *Sender:* Leonard Xumailto:xbjt...@gmail.com>>
> *Recipient:* godfrey he >
> *Cc:* Outlook >; user >
> *Date:* Tuesday, Feb 25, 2020 22:56
> *Subject:*

[jira] [Created] (FLINK-16300) Reworks SchedulerTestUtils with testing classes to replace mockito usages

Re: [VOTE] FLIP-100: Add Attempt Information

[jira] [Created] (FLINK-16299) Release containers recovered from previous attempt in which TaskExecutor is not started.

[VOTE] FLIP-84: Improve & Refactor API of TableEnvironment

[jira] [Created] (FLINK-16298) GroupWindowTableAggregateITCase.testEventTimeTumblingWindow fails on Travis

[jira] [Created] (FLINK-16297) Remove the redundant intent and blank line in highlight code blocks

Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

[jira] [Created] (FLINK-16296) Improve performance of BaseRowSerializer#serialize() for GenericRow

Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API

[jira] [Created] (FLINK-16295) Optimize BinaryString.copy to not materialize if there is javaObject

[jira] [Created] (FLINK-16294) JDBC connector support create database table automatically

Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

[jira] [Created] (FLINK-16293) Document using plugins in Kubernetes

[jira] [Created] (FLINK-16292) Execute all end to end tests on AZP

Re: Batch Flink Job S3 write performance vs Spark

Re: [DISCUSS] FLIP-76: Unaligned checkpoints

Re: [VOTE] FLIP-100: Add Attempt Information

Re: Batch Flink Job S3 write performance vs Spark

Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API

[jira] [Created] (FLINK-16291) Count(*) doesn't work with Hive module

[jira] [Created] (FLINK-16290) HttpUrl might return NULL if a schema is missing

Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API

[jira] [Created] (FLINK-16289) Missing serialVersionUID blocks running in Kubernetes.

Re: [VOTE] FLIP-100: Add Attempt Information

Re: [VOTE] FLIP-104: Add More Metrics to Jobmanager

[jira] [Created] (FLINK-16288) Setting the TTL for discarding task pods on Kubernetes.

Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API

Re: [VOTE] FLIP-102: Add More Metrics to TaskManager

[jira] [Created] (FLINK-16287) ES6 sql jar relocates log4j2

Re: [VOTE] FLIP-102: Add More Metrics to TaskManager

Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API

32 matches

Site Navigation

Mail list logo

Footer information