date:20190905

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

2019-09-05 Thread JingsongLee

Thanks dawid, +1 for this approach.

One concern is the removal of registerTableSink & registerTableSource
in TableEnvironment. It has two alternatives:
1.the properties approach (DDL, descriptor).
2.from/toDataStream.

#1 can only be properties, not java states, and some Connectors
are difficult to convert all states to properties.
#2 can contain java state. But can't use TableSource-related features,
like project & filter push down, partition support, etc..

Any idea about this?

Best,
Jingsong Lee

--
From:Dawid Wysakowicz
Send Time:2019年9月4日(星期三) 22:20
To:dev
Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Hi all,
As part of FLIP-30 a Catalog API was introduced that enables storing table meta
objects permanently. At the same time the majority of current APIs create
temporary objects that cannot be serialized. We should clarify the creation of
meta objects (tables, views, functions) in a unified way.
Another current problem in the API is that all the temporary objects are stored
in a special built-in catalog, which is not very intuitive for many users, as
they must be aware of that catalog to reference temporary objects.
Lastly, different APIs have different ways of providing object paths:

String path…,
String path, String pathContinued…
String name
We should choose one approach and unify it across all APIs.
I suggest a FLIP to address the above issues.
Looking forward to your opinions.
FLIP link:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module

Re: [DISCUSS] FLIP-66: Support time attribute in SQL DDL

2019-09-05 Thread Dian Fu

Hi Jark,

Thanks for bringing up this discussion and the detailed design doc. This is 
definitely a critical feature for streaming SQL jobs. I have left a few 
comments in the design doc.

Thanks,
Dian

> 在 2019年9月6日，上午11:48，Forward Xu  写道：
> 
> Thanks Jark for this topic, This will be very useful.
> 
> 
> Best,
> 
> ForwardXu
> 
> Danny Chan  于2019年9月6日周五 上午11:26写道：
> 
>> Thanks Jark for bring up this topic, this is definitely an import feature
>> for the SQL, especially the DDL users.
>> 
>> I would spend some time to review this design doc, really thanks.
>> 
>> Best,
>> Danny Chan
>> 在 2019年9月6日 +0800 AM11:19，Jark Wu ，写道：
>>> Hi everyone,
>>> 
>>> I would like to start discussion about how to support time attribute in
>> SQL
>>> DDL.
>>> In Flink 1.9, we already introduced a basic SQL DDL to create a table.
>>> However, it doesn't support to define time attributes. This makes users
>>> can't
>>> apply window operations on the tables created by DDL which is a bad
>>> experience.
>>> 
>>> In FLIP-66, we propose a syntax for watermark to define rowtime attribute
>>> and propose to use computed column syntax to define proctime attribute.
>>> But computed column is another big topic and should deserve a separate
>>> FLIP.
>>> If we have a consensus on the computed column approach, we will start
>>> computed column FLIP soon.
>>> 
>>> FLIP-66:
>>> 
>> https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit#
>>> 
>>> Thanks for any feedback!
>>> 
>>> Best,
>>> Jark
>>

Re: [DISCUSS] Support notifyOnMaster for notifyCheckpointComplete

2019-09-05 Thread shimin yang

Hi Jingsong,

Big fan of this idea. We faced the same problem and resolved by adding a
distributed lock. It would be nice to have this feature in JobMaster, which
can replace the lock.

Best,
Shimin

JingsongLee  于2019年9月6日周五 下午12:20写道：

> Hi devs:
>
> I try to implement streaming file sink for table[1] like StreamingFileSink.
> If the underlying is a HiveFormat, or a format that updates visibility
>  through a metaStore, I have to update the metaStore in the
>  notifyCheckpointComplete, but this operation occurs on the task side,
> which will lead to distributed access to the metaStore, which will
>  lead to bottleneck.
>
> So I'm curious if we can support notifyOnMaster for
> notifyCheckpointComplete like FinalizeOnMaster.
>
> What do you think?
>
> [1]
> https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing
>
> Best,
> Jingsong Lee

[DISCUSS] Support customize state in customized KeyedStateBackend

2019-09-05 Thread shimin yang

Hi every,

I would like to start a discussion on supporting customize state
in customized KeyedStateBackend.

In Flink, users can customize KeyedStateBackend to support different type
of data store. Although we can implement customized StateDescriptors for
different kind of data structrues, we do not really have access to such
customized state in RichFunctions.

I propose to add a getOtherState method in RuntimeContext and
DefaultKeyedStateStore which directly takes StateDescriptor as parameter to
allow user to get customized state.

What do you think?

Thanks.

Best,
Shimin

[DISCUSS] FLIP-63: Rework table partition support

2019-09-05 Thread JingsongLee

Hi everyone, thank you for your comments. Mail name was updated 
and streaming-related concepts were added.

We would like to start a discussion thread on "FLIP-63: Rework table
partition support"(Design doc: [1]), where we describe how to partition
support in flink and how to integrate to hive partition.

This FLIP addresses:
   - Introduce whole story about partition support.
   - Introduce and discuss DDL of partition support.
   - Introduce static and dynamic partition insert.
   - Introduce partition pruning
   - Introduce dynamic partition implementation
   - Introduce FileFormatSink to deal with streaming exactly-once and
 partition-related logic.

Details can be seen in the design document.
Looking forward to your feedbacks. Thank you.

[1] 
https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing

Best,
Jingsong Lee

[DISCUSS] Support notifyOnMaster for notifyCheckpointComplete

2019-09-05 Thread JingsongLee

Hi devs:

I try to implement streaming file sink for table[1] like StreamingFileSink.
If the underlying is a HiveFormat, or a format that updates visibility
 through a metaStore, I have to update the metaStore in the
 notifyCheckpointComplete, but this operation occurs on the task side, 
which will lead to distributed access to the metaStore, which will
 lead to bottleneck.

So I'm curious if we can support notifyOnMaster for 
notifyCheckpointComplete like FinalizeOnMaster.

What do you think?

[1] 
https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing

Best,
Jingsong Lee

Re: [DISCUSS] FLIP-66: Support time attribute in SQL DDL

2019-09-05 Thread Forward Xu

Thanks Jark for this topic, This will be very useful.


Best,

ForwardXu

Danny Chan  于2019年9月6日周五 上午11:26写道：

> Thanks Jark for bring up this topic, this is definitely an import feature
> for the SQL, especially the DDL users.
>
> I would spend some time to review this design doc, really thanks.
>
> Best,
> Danny Chan
> 在 2019年9月6日 +0800 AM11:19，Jark Wu ，写道：
> > Hi everyone,
> >
> > I would like to start discussion about how to support time attribute in
> SQL
> > DDL.
> > In Flink 1.9, we already introduced a basic SQL DDL to create a table.
> > However, it doesn't support to define time attributes. This makes users
> > can't
> > apply window operations on the tables created by DDL which is a bad
> > experience.
> >
> > In FLIP-66, we propose a syntax for watermark to define rowtime attribute
> > and propose to use computed column syntax to define proctime attribute.
> > But computed column is another big topic and should deserve a separate
> > FLIP.
> > If we have a consensus on the computed column approach, we will start
> > computed column FLIP soon.
> >
> > FLIP-66:
> >
> https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit#
> >
> > Thanks for any feedback!
> >
> > Best,
> > Jark
>

Re: [DISCUSS] FLIP-66: Support time attribute in SQL DDL

2019-09-05 Thread Danny Chan

Thanks Jark for bring up this topic, this is definitely an import feature for 
the SQL, especially the DDL users.

I would spend some time to review this design doc, really thanks.

Best,
Danny Chan
在 2019年9月6日 +0800 AM11:19，Jark Wu ，写道：
> Hi everyone,
>
> I would like to start discussion about how to support time attribute in SQL
> DDL.
> In Flink 1.9, we already introduced a basic SQL DDL to create a table.
> However, it doesn't support to define time attributes. This makes users
> can't
> apply window operations on the tables created by DDL which is a bad
> experience.
>
> In FLIP-66, we propose a syntax for watermark to define rowtime attribute
> and propose to use computed column syntax to define proctime attribute.
> But computed column is another big topic and should deserve a separate
> FLIP.
> If we have a consensus on the computed column approach, we will start
> computed column FLIP soon.
>
> FLIP-66:
> https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit#
>
> Thanks for any feedback!
>
> Best,
> Jark

[DISCUSS] FLIP-66: Support time attribute in SQL DDL

2019-09-05 Thread Jark Wu

Hi everyone,

I would like to start discussion about how to support time attribute in SQL
DDL.
In Flink 1.9, we already introduced a basic SQL DDL to create a table.
However, it doesn't support to define time attributes. This makes users
can't
apply window operations on the tables created by DDL which is a bad
experience.

In FLIP-66, we propose a syntax for watermark to define rowtime attribute
and propose to use computed column syntax to define proctime attribute.
But computed column is another big topic and should deserve a separate
FLIP.
If we have a consensus on the computed column approach, we will start
computed column FLIP soon.

FLIP-66:
https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit#

Thanks for any feedback!

Best,
Jark

[VOTE] FLIP-53: Fine Grained Operator Resource Management

2019-09-05 Thread Xintong Song

Hi all,

I would like to start the voting process for FLIP-53 [1], which is
discussed and reached consensus in this thread [2].

This voting will be open for at least 72 hours (excluding weekends). I'll
try to close it Sep. 11, 04:00 UTC, unless there is an objection or not
enough votes.

Thank you~

Xintong Song


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management

[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html

[jira] [Created] (FLINK-13986) Clean-up of legacy mode

2019-09-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-13986:


 Summary: Clean-up of legacy mode
 Key: FLINK-13986
 URL: https://issues.apache.org/jira/browse/FLINK-13986
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-13983) Launch task executor with new memory calculation logics

2019-09-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-13983:


 Summary: Launch task executor with new memory calculation logics
 Key: FLINK-13983
 URL: https://issues.apache.org/jira/browse/FLINK-13983
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-13985) Use native memory for managed memory.

2019-09-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-13985:


 Summary: Use native memory for managed memory.
 Key: FLINK-13985
 URL: https://issues.apache.org/jira/browse/FLINK-13985
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-13982) Implement memory calculation logics

2019-09-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-13982:


 Summary: Implement memory calculation logics
 Key: FLINK-13982
 URL: https://issues.apache.org/jira/browse/FLINK-13982
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-13984) Separate on-heap and off-heap managed memory pools

2019-09-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-13984:


 Summary: Separate on-heap and off-heap managed memory pools
 Key: FLINK-13984
 URL: https://issues.apache.org/jira/browse/FLINK-13984
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-13981) Introduce a switch for enabling the new task executor memory configurations

2019-09-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-13981:


 Summary: Introduce a switch for enabling the new task executor 
memory configurations
 Key: FLINK-13981
 URL: https://issues.apache.org/jira/browse/FLINK-13981
 Project: Flink
  Issue Type: Sub-task
Reporter: Xintong Song






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-13980) FLIP-49 Unified Memory Configuration for TaskExecutors

2019-09-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-13980:


 Summary: FLIP-49 Unified Memory Configuration for TaskExecutors
 Key: FLINK-13980
 URL: https://issues.apache.org/jira/browse/FLINK-13980
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Configuration, Runtime / Coordination
Affects Versions: 1.9.0
Reporter: Xintong Song
 Fix For: 1.10.0


This is the umbrella issue of 'FLIP-49: Unified Memory Configuration for 
TaskExecutors'.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Re: [VOTE] FLIP-49: Unified Memory Configuration for TaskExecutors

2019-09-05 Thread Xintong Song

Thanks all for the votes.
So far, we have

   - 4 binding +1 votes (Stephan, Andrey, Till, Zhijiang)
   - 2 un-binding +1 votes (Xintong, Yu)
   - No -1 votes

There are more than 3 binding +1 votes and no -1 votes, and the voting time
has past. According to the new bylaws, I'm happily to announce that FLIP-49
is approved to be adopted by Apache Flink.

Regarding the minors mentioned during the voting, if there's no objection,
I would like to update the FLIP document with the followings

   - Exclude JVM Overhead from '-XX:MaxDirectMemorySize'
   - Add a 'Follow-Up' section, with the follow-ups of web ui and
   documentation issues
   - Add a 'Limitation' section, with the Unsafe Java 12 support issue


Thank you~

Xintong Song



On Fri, Sep 6, 2019 at 10:28 AM Xintong Song  wrote:

> +1 (non-binding) from my side.
>
> @Yu, thanks for the vote.
> - The current FLIP document already mentioned the issue that Unsafe is not
> supported in Java 12, in the section 'Unifying Explicit and Implicit Memory
> Allocation'. It makes sense to me to emphasize this by adding a separate
> limitation section.
> - I think we should also update the FLIP document if we change the config
> names later in PRs. But I would not consider this as a major change to the
> FLIP that requires another vote, especially when we already agreed during
> this vote to revisit the config names at implementation stage.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Fri, Sep 6, 2019 at 2:43 AM Yu Li  wrote:
>
>> +1 (non-binding)
>>
>> Minor:
>> 1. Is it worth a separate "Limitations" section to contain all relative
>> information like the Unsafe support issue in Java 12, just like many other
>> FLIPs?
>> 2. About the config names, if we change them later in PR, does it mean we
>> will need to update the FLIP document? If so, it seems we need another
>> vote
>> after the modification according to current bylaw? Or maybe we could just
>> create a subpage under the FLIP and only re-vote on that part later?
>>
>> Thanks.
>>
>> Best Regards,
>> Yu
>>
>>
>> On Thu, 5 Sep 2019 at 00:52, Stephan Ewen  wrote:
>>
>> > Let's not block on config key names, just go ahead and we figure this
>> out
>> > concurrently or on the PR later.
>> >
>> >
>> > On Wed, Sep 4, 2019 at 3:48 PM Stephan Ewen  wrote:
>> >
>> > > Maybe to clear up confusion about my suggestion:
>> > >
>> > > I would vote to keep the name of the config parameter
>> > > "taskmanager.memory.network" because it is the same key as currently
>> > (good
>> > > to not break things unless good reason) and there currently is no
>> case or
>> > > even a concrete follow-up where we would actually differentiate
>> between
>> > > different types of network memory.
>> > >
>> > > I would suggest to not prematurely rename this because of something
>> that
>> > > might happen in the future. Experience shows that its better to do
>> these
>> > > things when the actual change comes.
>> > >
>> > > Side note: I am not sure "shuffle" is a good term in this context. I
>> have
>> > > so far only heard that in batch contexts, which is not what we do
>> here.
>> > One
>> > > more reason for me to not pre-maturely change names.
>> > >
>> > > On Wed, Sep 4, 2019 at 10:56 AM Xintong Song 
>> > > wrote:
>> > >
>> > >> @till
>> > >>
>> > >> > Just to clarify Xintong, you suggest that Task off-heap memory
>> > >> represents
>> > >> > direct and native memory. Since we don't know how the user will
>> > allocate
>> > >> > the memory we will add this value to -XX:MaxDirectMemorySize so
>> that
>> > the
>> > >> > process won't fail if the user allocates only direct memory and no
>> > >> native
>> > >> > memory. Is that correct?
>> > >> >
>> > >> Yes, this is what I mean.
>> > >>
>> > >>
>> > >> Thank you~
>> > >>
>> > >> Xintong Song
>> > >>
>> > >>
>> > >>
>> > >> On Wed, Sep 4, 2019 at 4:25 PM Till Rohrmann 
>> > >> wrote:
>> > >>
>> > >> > Just to clarify Xintong, you suggest that Task off-heap memory
>> > >> represents
>> > >> > direct and native memory. Since we don't know how the user will
>> > allocate
>> > >> > the memory we will add this value to -XX:MaxDirectMemorySize so
>> that
>> > the
>> > >> > process won't fail if the user allocates only direct memory and no
>> > >> native
>> > >> > memory. Is that correct?
>> > >> >
>> > >> > Cheers,
>> > >> > Till
>> > >> >
>> > >> > On Wed, Sep 4, 2019 at 10:18 AM Xintong Song <
>> tonysong...@gmail.com>
>> > >> > wrote:
>> > >> >
>> > >> > > @Stephan
>> > >> > > Not sure what do you mean by "just having this value". Are you
>> > >> suggesting
>> > >> > > that having "taskmanager.memory.network" refers to "shuffle" and
>> > >> "other
>> > >> > > network memory", or only "shuffle"?
>> > >> > >
>> > >> > > I guess what you mean is only "shuffle"? Because currently
>> > >> > > "taskmanager.network.memory" refers to shuffle buffers only,
>> which
>> > is
>> > >> > "one
>> > >> > > less config value to break".
>> > >> > >
>> > >> > > Thank you~
>> > >> > >
>> > >> > > Xintong Song
>> >

Re: [VOTE] FLIP-49: Unified Memory Configuration for TaskExecutors

2019-09-05 Thread Xintong Song

+1 (non-binding) from my side.

@Yu, thanks for the vote.
- The current FLIP document already mentioned the issue that Unsafe is not
supported in Java 12, in the section 'Unifying Explicit and Implicit Memory
Allocation'. It makes sense to me to emphasize this by adding a separate
limitation section.
- I think we should also update the FLIP document if we change the config
names later in PRs. But I would not consider this as a major change to the
FLIP that requires another vote, especially when we already agreed during
this vote to revisit the config names at implementation stage.

Thank you~

Xintong Song



On Fri, Sep 6, 2019 at 2:43 AM Yu Li  wrote:

> +1 (non-binding)
>
> Minor:
> 1. Is it worth a separate "Limitations" section to contain all relative
> information like the Unsafe support issue in Java 12, just like many other
> FLIPs?
> 2. About the config names, if we change them later in PR, does it mean we
> will need to update the FLIP document? If so, it seems we need another vote
> after the modification according to current bylaw? Or maybe we could just
> create a subpage under the FLIP and only re-vote on that part later?
>
> Thanks.
>
> Best Regards,
> Yu
>
>
> On Thu, 5 Sep 2019 at 00:52, Stephan Ewen  wrote:
>
> > Let's not block on config key names, just go ahead and we figure this out
> > concurrently or on the PR later.
> >
> >
> > On Wed, Sep 4, 2019 at 3:48 PM Stephan Ewen  wrote:
> >
> > > Maybe to clear up confusion about my suggestion:
> > >
> > > I would vote to keep the name of the config parameter
> > > "taskmanager.memory.network" because it is the same key as currently
> > (good
> > > to not break things unless good reason) and there currently is no case
> or
> > > even a concrete follow-up where we would actually differentiate between
> > > different types of network memory.
> > >
> > > I would suggest to not prematurely rename this because of something
> that
> > > might happen in the future. Experience shows that its better to do
> these
> > > things when the actual change comes.
> > >
> > > Side note: I am not sure "shuffle" is a good term in this context. I
> have
> > > so far only heard that in batch contexts, which is not what we do here.
> > One
> > > more reason for me to not pre-maturely change names.
> > >
> > > On Wed, Sep 4, 2019 at 10:56 AM Xintong Song 
> > > wrote:
> > >
> > >> @till
> > >>
> > >> > Just to clarify Xintong, you suggest that Task off-heap memory
> > >> represents
> > >> > direct and native memory. Since we don't know how the user will
> > allocate
> > >> > the memory we will add this value to -XX:MaxDirectMemorySize so that
> > the
> > >> > process won't fail if the user allocates only direct memory and no
> > >> native
> > >> > memory. Is that correct?
> > >> >
> > >> Yes, this is what I mean.
> > >>
> > >>
> > >> Thank you~
> > >>
> > >> Xintong Song
> > >>
> > >>
> > >>
> > >> On Wed, Sep 4, 2019 at 4:25 PM Till Rohrmann 
> > >> wrote:
> > >>
> > >> > Just to clarify Xintong, you suggest that Task off-heap memory
> > >> represents
> > >> > direct and native memory. Since we don't know how the user will
> > allocate
> > >> > the memory we will add this value to -XX:MaxDirectMemorySize so that
> > the
> > >> > process won't fail if the user allocates only direct memory and no
> > >> native
> > >> > memory. Is that correct?
> > >> >
> > >> > Cheers,
> > >> > Till
> > >> >
> > >> > On Wed, Sep 4, 2019 at 10:18 AM Xintong Song  >
> > >> > wrote:
> > >> >
> > >> > > @Stephan
> > >> > > Not sure what do you mean by "just having this value". Are you
> > >> suggesting
> > >> > > that having "taskmanager.memory.network" refers to "shuffle" and
> > >> "other
> > >> > > network memory", or only "shuffle"?
> > >> > >
> > >> > > I guess what you mean is only "shuffle"? Because currently
> > >> > > "taskmanager.network.memory" refers to shuffle buffers only, which
> > is
> > >> > "one
> > >> > > less config value to break".
> > >> > >
> > >> > > Thank you~
> > >> > >
> > >> > > Xintong Song
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Wed, Sep 4, 2019 at 3:42 PM Stephan Ewen 
> > wrote:
> > >> > >
> > >> > > > If we later split the network memory into "shuffle" and "other
> > >> network
> > >> > > > memory", I think it would make sense to split the option then.
> > >> > > >
> > >> > > > In that case "taskmanager.memory.network" would probably refer
> to
> > >> the
> > >> > > total
> > >> > > > network memory, which would most likely be what most users
> > actually
> > >> > > > configure.
> > >> > > > My feeling is that for now just having this value is actually
> > >> easier,
> > >> > and
> > >> > > > it is one less config value to break (which is also good).
> > >> > > >
> > >> > > > On Wed, Sep 4, 2019 at 9:05 AM Xintong Song <
> > tonysong...@gmail.com>
> > >> > > wrote:
> > >> > > >
> > >> > > > > Thanks for the voting and comments.
> > >> > > > >
> > >> > > > > @Stephan
> > >> > > > > - The '-XX:MaxDirectMemorySize' value should not include JVM
> >

Re: [VOTE] FLIP-62: Set default restart delay for FixedDelay- and FailureRateRestartStrategy to 1s

2019-09-05 Thread vino yang

+1 (non-binding)

Best,
Vino

Yu Li  于2019年9月6日周五 上午2:13写道：

> +1 (non-binding)
>
> Best Regards,
> Yu
>
>
> On Thu, 5 Sep 2019 at 00:23, zhijiang 
> wrote:
>
> > +1
> >
> > Best,
> > Zhijiang
> > --
> > From:Jark Wu 
> > Send Time:2019年9月4日(星期三) 13:45
> > To:dev 
> > Subject:Re: [VOTE] FLIP-62: Set default restart delay for FixedDelay- and
> > FailureRateRestartStrategy to 1s
> >
> > +1
> >
> > Best,
> > Jark
> >
> > > 在 2019年9月4日，19:43，David Morávek  写道：
> > >
> > > +1
> > >
> > > On Wed, Sep 4, 2019 at 1:38 PM Till Rohrmann 
> > wrote:
> > >
> > >> +1 (binding)
> > >>
> > >> On Wed, Sep 4, 2019 at 12:43 PM Chesnay Schepler 
> > >> wrote:
> > >>
> > >>> +1 (binding)
> > >>>
> > >>> On 04/09/2019 11:18, JingsongLee wrote:
> >  +1 (non-binding)
> > 
> >  default 0 is really not user production friendly.
> > 
> >  Best,
> >  Jingsong Lee
> > 
> > 
> >  --
> >  From:Zhu Zhu 
> >  Send Time:2019年9月4日(星期三) 17:13
> >  To:dev 
> >  Subject:Re: [VOTE] FLIP-62: Set default restart delay for
> FixedDelay-
> > >>> and FailureRateRestartStrategy to 1s
> > 
> >  +1 (non-binding)
> > 
> >  Thanks,
> >  Zhu Zhu
> > 
> >  Till Rohrmann  于2019年9月4日周三 下午5:06写道：
> > 
> > > Hi everyone,
> > >
> > > I would like to start the voting process for FLIP-62 [1], which
> > > is discussed and reached consensus in this thread [2].
> > >
> > > Since the change is rather small I'd like to shorten the voting
> > period
> > >>> to
> > > 48 hours. Hence, I'll try to close it September 6th, 11:00 am CET,
> > >>> unless
> > > there is an objection or not enough votes.
> > >
> > > [1]
> > >
> > >
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-62%3A+Set+default+restart+delay+for+FixedDelay-+and+FailureRateRestartStrategy+to+1s
> > > [2]
> > >
> > >
> > >>>
> > >>
> >
> https://lists.apache.org/thread.html/9602b342602a0181fcb618581f3b12e692ed2fad98c59fd6c1caeabd@%3Cdev.flink.apache.org%3E
> > >
> > > Cheers,
> > > Till
> > >
> > >>>
> > >>>
> > >>
> >
> >
>

Re: [ANNOUNCE] Java 11 cron builds activated on master

2019-09-05 Thread vino yang

Great news Chesnay! Thanks for your hard work and effort!

Best,
Vino

Chesnay Schepler  于2019年9月5日周四 下午9:58写道：

> Hello everyone,
>
> I just wanted to inform everyone that we now run Java 11 builds on
> Travis as part of the cron jobs, subsuming the existing Java 9 tests.
> All existing Java 9 build/test infrastructure has been removed.
>
> If you spot any test failures that appear to be specific to Java 11,
> please add a sub-task to FLINK-10725.
>
> I would also encourage everyone to try out Java 11 for local development
> and usage, so that we can find pain points in the dev and user experience.
>
>

Re: [DISCUSS] Contribute Pulsar Flink connector back to Flink

2019-09-05 Thread Bowen Li

Hi,

I think having a Pulsar connector in Flink can be a good mutual benefit to
both communities.

Another perspective is that Pulsar connector is the 1st streaming connector
that integrates with Flink's metadata management system and Catalog APIs.
It'll be cool to see how the integration turns out and whether we need to
improve Flink Catalog stack, which are currently in Beta, to cater to
streaming source/sink. Thus I'm in favor of merging Pulsar connector into
Flink 1.10.

I'd suggest to submit smaller sized PRs, e.g. maybe one for basic
source/sink functionalities and another for schema and catalog integration,
just to make them easier to review.

It doesn't seem to hurt to wait for FLIP-27. But I don't think FLIP-27
should be a blocker in cases where it cannot make its way into 1.10 or
doesn't leave reasonable amount of time for committers to review or for
Pulsar connector to fully adapt to new interfaces.

Bowen



On Thu, Sep 5, 2019 at 3:21 AM Becket Qin  wrote:

> Hi Till,
>
> You are right. It all depends on when the new source interface is going to
> be ready. Personally I think it would be there in about a month or so. But
> I could be too optimistic. It would also be good to hear what do Aljoscha
> and Stephan think as they are also involved in FLIP-27.
>
> In general I think we should have Pulsar connector in Flink 1.10,
> preferably with the new source interface. We can also check it in right now
> with old source interface, but I suspect few users will use it before the
> next official release. Therefore, it seems reasonable to wait a little bit
> to see whether we can jump to the new source interface. As long as we make
> sure Flink 1.10 has it, waiting a little bit doesn't seem to hurt much.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Thu, Sep 5, 2019 at 3:59 PM Till Rohrmann  wrote:
>
> > Hi everyone,
> >
> > I'm wondering what the problem would be if we committed the Pulsar
> > connector before the new source interface is ready. If I understood it
> > correctly, then we need to support the old source interface anyway for
> the
> > existing connectors. By checking it in early I could see the benefit that
> > our users could start using the connector earlier. Moreover, it would
> > prevent that the Pulsar integration is being delayed in case that the
> > source interface should be delayed. The only downside I see is the extra
> > review effort and potential fixes which might be irrelevant for the new
> > source interface implementation. I guess it mainly depends on how certain
> > we are when the new source interface will be ready.
> >
> > Cheers,
> > Till
> >
> > On Thu, Sep 5, 2019 at 8:56 AM Becket Qin  wrote:
> >
> > > Hi Sijie and Yijie,
> > >
> > > Thanks for sharing your thoughts.
> > >
> > > Just want to have some update on FLIP-27. Although the FLIP wiki and
> > > discussion thread has been quiet for some time, a few committer /
> > > contributors in Flink community were actually prototyping the entire
> > thing.
> > > We have made some good progress there but want to update the FLIP wiki
> > > after the entire thing is verified to work in case there are some last
> > > minute surprise in the implementation. I don't have an exact ETA yet,
> > but I
> > > guess it is going to be within a month or so.
> > >
> > > I am happy to review the current Flink Pulsar connector and see if it
> > would
> > > fit in FLIP-27. It would be good to avoid the case that we checked in
> the
> > > Pulsar connector with some review efforts and shortly after that the
> new
> > > Source interface is ready.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Thu, Sep 5, 2019 at 8:39 AM Yijie Shen 
> > > wrote:
> > >
> > > > Thanks for all the feedback and suggestions!
> > > >
> > > > As Sijie said, the goal of the connector has always been to provide
> > > > users with the latest features of both systems as soon as possible.
> We
> > > > propose to contribute the connector to Flink and hope to get more
> > > > suggestions and feedback from Flink experts to ensure the high
> quality
> > > > of the connector.
> > > >
> > > > For FLIP-27, we noticed its existence at the beginning of reworking
> > > > the connector implementation based on Flink 1.9; we also wanted to
> > > > build a connector that supports both batch and stream computing based
> > > > on it.
> > > > However, it has been inactive for some time, so we decided to provide
> > > > a connector with most of the new features, such as the new type
> system
> > > > and the new catalog API first. We will pay attention to the progress
> > > > of FLIP-27 continually and incorporate it with the connector as soon
> > > > as possible.
> > > >
> > > > Regarding the test status of the connector, we are following the
> other
> > > > connectors' test in Flink repository and aimed to provide throughout
> > > > tests as we could. We are also happy to hear suggestions and
> > > > supervision from the Flink community to improve the stability and
>

Re: [VOTE] FLIP-49: Unified Memory Configuration for TaskExecutors

2019-09-05 Thread Yu Li

+1 (non-binding)

Minor:
1. Is it worth a separate "Limitations" section to contain all relative
information like the Unsafe support issue in Java 12, just like many other
FLIPs?
2. About the config names, if we change them later in PR, does it mean we
will need to update the FLIP document? If so, it seems we need another vote
after the modification according to current bylaw? Or maybe we could just
create a subpage under the FLIP and only re-vote on that part later?

Thanks.

Best Regards,
Yu


On Thu, 5 Sep 2019 at 00:52, Stephan Ewen  wrote:

> Let's not block on config key names, just go ahead and we figure this out
> concurrently or on the PR later.
>
>
> On Wed, Sep 4, 2019 at 3:48 PM Stephan Ewen  wrote:
>
> > Maybe to clear up confusion about my suggestion:
> >
> > I would vote to keep the name of the config parameter
> > "taskmanager.memory.network" because it is the same key as currently
> (good
> > to not break things unless good reason) and there currently is no case or
> > even a concrete follow-up where we would actually differentiate between
> > different types of network memory.
> >
> > I would suggest to not prematurely rename this because of something that
> > might happen in the future. Experience shows that its better to do these
> > things when the actual change comes.
> >
> > Side note: I am not sure "shuffle" is a good term in this context. I have
> > so far only heard that in batch contexts, which is not what we do here.
> One
> > more reason for me to not pre-maturely change names.
> >
> > On Wed, Sep 4, 2019 at 10:56 AM Xintong Song 
> > wrote:
> >
> >> @till
> >>
> >> > Just to clarify Xintong, you suggest that Task off-heap memory
> >> represents
> >> > direct and native memory. Since we don't know how the user will
> allocate
> >> > the memory we will add this value to -XX:MaxDirectMemorySize so that
> the
> >> > process won't fail if the user allocates only direct memory and no
> >> native
> >> > memory. Is that correct?
> >> >
> >> Yes, this is what I mean.
> >>
> >>
> >> Thank you~
> >>
> >> Xintong Song
> >>
> >>
> >>
> >> On Wed, Sep 4, 2019 at 4:25 PM Till Rohrmann 
> >> wrote:
> >>
> >> > Just to clarify Xintong, you suggest that Task off-heap memory
> >> represents
> >> > direct and native memory. Since we don't know how the user will
> allocate
> >> > the memory we will add this value to -XX:MaxDirectMemorySize so that
> the
> >> > process won't fail if the user allocates only direct memory and no
> >> native
> >> > memory. Is that correct?
> >> >
> >> > Cheers,
> >> > Till
> >> >
> >> > On Wed, Sep 4, 2019 at 10:18 AM Xintong Song 
> >> > wrote:
> >> >
> >> > > @Stephan
> >> > > Not sure what do you mean by "just having this value". Are you
> >> suggesting
> >> > > that having "taskmanager.memory.network" refers to "shuffle" and
> >> "other
> >> > > network memory", or only "shuffle"?
> >> > >
> >> > > I guess what you mean is only "shuffle"? Because currently
> >> > > "taskmanager.network.memory" refers to shuffle buffers only, which
> is
> >> > "one
> >> > > less config value to break".
> >> > >
> >> > > Thank you~
> >> > >
> >> > > Xintong Song
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Sep 4, 2019 at 3:42 PM Stephan Ewen 
> wrote:
> >> > >
> >> > > > If we later split the network memory into "shuffle" and "other
> >> network
> >> > > > memory", I think it would make sense to split the option then.
> >> > > >
> >> > > > In that case "taskmanager.memory.network" would probably refer to
> >> the
> >> > > total
> >> > > > network memory, which would most likely be what most users
> actually
> >> > > > configure.
> >> > > > My feeling is that for now just having this value is actually
> >> easier,
> >> > and
> >> > > > it is one less config value to break (which is also good).
> >> > > >
> >> > > > On Wed, Sep 4, 2019 at 9:05 AM Xintong Song <
> tonysong...@gmail.com>
> >> > > wrote:
> >> > > >
> >> > > > > Thanks for the voting and comments.
> >> > > > >
> >> > > > > @Stephan
> >> > > > > - The '-XX:MaxDirectMemorySize' value should not include JVM
> >> > Overhead.
> >> > > > > Thanks for correction.
> >> > > > > - 'taskmanager.memory.framework.heap' it heap memory reserved
> for
> >> > task
> >> > > > > executor framework, which can not be allocated to task slots. I
> >> think
> >> > > > users
> >> > > > > should be able to configure both how many total java heap
> memory a
> >> > task
> >> > > > > executor should have and how many of the total java heap memory
> >> can
> >> > be
> >> > > > > allocated to task slots.
> >> > > > > - Regarding network / shuffle memory, I'm with @Andrey. ATM,
> this
> >> > > memory
> >> > > > > pool (derived from
> >> "taskmanager.network.memory.[min/max/fraction]")
> >> > is
> >> > > > only
> >> > > > > used inside NettyShuffleEnvironment as network buffers. There
> >> might
> >> > be
> >> > > > > other network memory usage outside the shuffle component (as
> >> > @Zhijiang
> >> > > > also
> >> > > > > suggested), but that is not

Re: [VOTE] FLIP-62: Set default restart delay for FixedDelay- and FailureRateRestartStrategy to 1s

2019-09-05 Thread Yu Li

+1 (non-binding)

Best Regards,
Yu


On Thu, 5 Sep 2019 at 00:23, zhijiang 
wrote:

> +1
>
> Best,
> Zhijiang
> --
> From:Jark Wu 
> Send Time:2019年9月4日(星期三) 13:45
> To:dev 
> Subject:Re: [VOTE] FLIP-62: Set default restart delay for FixedDelay- and
> FailureRateRestartStrategy to 1s
>
> +1
>
> Best,
> Jark
>
> > 在 2019年9月4日，19:43，David Morávek  写道：
> >
> > +1
> >
> > On Wed, Sep 4, 2019 at 1:38 PM Till Rohrmann 
> wrote:
> >
> >> +1 (binding)
> >>
> >> On Wed, Sep 4, 2019 at 12:43 PM Chesnay Schepler 
> >> wrote:
> >>
> >>> +1 (binding)
> >>>
> >>> On 04/09/2019 11:18, JingsongLee wrote:
>  +1 (non-binding)
> 
>  default 0 is really not user production friendly.
> 
>  Best,
>  Jingsong Lee
> 
> 
>  --
>  From:Zhu Zhu 
>  Send Time:2019年9月4日(星期三) 17:13
>  To:dev 
>  Subject:Re: [VOTE] FLIP-62: Set default restart delay for FixedDelay-
> >>> and FailureRateRestartStrategy to 1s
> 
>  +1 (non-binding)
> 
>  Thanks,
>  Zhu Zhu
> 
>  Till Rohrmann  于2019年9月4日周三 下午5:06写道：
> 
> > Hi everyone,
> >
> > I would like to start the voting process for FLIP-62 [1], which
> > is discussed and reached consensus in this thread [2].
> >
> > Since the change is rather small I'd like to shorten the voting
> period
> >>> to
> > 48 hours. Hence, I'll try to close it September 6th, 11:00 am CET,
> >>> unless
> > there is an objection or not enough votes.
> >
> > [1]
> >
> >
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-62%3A+Set+default+restart+delay+for+FixedDelay-+and+FailureRateRestartStrategy+to+1s
> > [2]
> >
> >
> >>>
> >>
> https://lists.apache.org/thread.html/9602b342602a0181fcb618581f3b12e692ed2fad98c59fd6c1caeabd@%3Cdev.flink.apache.org%3E
> >
> > Cheers,
> > Till
> >
> >>>
> >>>
> >>
>
>

Re: instable checkpointing after migration to flink 1.8

2019-09-05 Thread Yun Tang

Hi Bekir

From what I could see, there should be two main factors influencing your time 
of sync execution checkpoint within that task.

  1.  Snapshot timers in heap to S3 [1] (network IO)
  2.  Creating local RocksDB checkpoint on disk [2] (disk IO)

For the first part, unfortunately, there is no log or metrics could detect how 
long it takes.
For the second part, you could login the machine where running that task, and 
find logs of RocksDB (default DB folder is 
{io.tmp.dirs}/flink-io-xxx/job-xxx/db and the log file name is LOG). You could 
check the interval of logs between "Started the snapshot process -- creating 
snapshot in directory" and "Snapshot DONE" to know how long RocksDB takes to 
create checkpoint on local disk.

If we configure "state.backend.rocksdb.timer-service.factory" to "ROCKSDB", we 
could avoid the 1st part of time and this might be a solution to your problem. 
But to be honest, the implementation of timer snapshot code almost stay the 
same for Flink-1.6 and Flink-1.8 and should not be a regression.

[1] 
https://github.com/apache/flink/blob/ba1cd43b6ed42070fd9013159a5b65adb18b3975/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/AbstractStreamOperator.java#L453
[2] 
https://github.com/apache/flink/blob/ba1cd43b6ed42070fd9013159a5b65adb18b3975/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/snapshot/RocksIncrementalSnapshotStrategy.java#L249

Best
Yun Tang

From: Congxian Qiu 
Sent: Thursday, September 5, 2019 10:38
To: Bekir Oguz 
Cc: Stephan Ewen ; dev ; Niels 
Alebregtse ; Vladislav Bakayev 

Subject: Re: instable checkpointing after migration to flink 1.8

Another information from our private emails

there ALWAYS have Kafka AbstractCoordinator logs about lost connection to
Kafka at the same time we have the checkpoints confirmed. Bekir checked the
Kafka broker log, but did not find any interesting things there.

Best,
Congxian


Congxian Qiu  于2019年9月5日周四 上午10:26写道：

> Hi Bekir,
>
> If it is the storage place for timers, for RocksDBStateBackend, timers can
> be stored in Heap or RocksDB[1][2]
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#tuning-rocksdb
> [2]
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/state_backends.html#rocksdb-state-backend-config-options
>
> Best,
> Congxian
>
>
> Bekir Oguz  于2019年9月4日周三 下午11:38写道：
>
>> Hi Stephan,
>> sorry for late response.
>> We indeed use timers inside a KeyedProcessFunction but the triggers of
>> the timers are kinda evenly distributed, so not causing a firing storm.
>> We have a custom ttl logic which is used by the deletion timer to decide
>> whether delete a record from inmemory state or not.
>> Can you maybe give some links to those changes in the RocksDB?
>>
>> Thanks in advance,
>> Bekir Oguz
>>
>> On Fri, 30 Aug 2019 at 14:56, Stephan Ewen  wrote:
>>
>>> Hi all!
>>>
>>> A thought would be that this has something to do with timers. Does the
>>> task with that behavior use timers (windows, or process function)?
>>>
>>> If that is the case, some theories to check:
>>>   - Could it be a timer firing storm coinciding with a checkpoint?
>>> Currently, that storm synchronously fires, checkpoints cannot preempt that,
>>> which should change in 1.10 with the new mailbox model.
>>>   - Could the timer-async checkpointing changes have something to do
>>> with that? Does some of the usually small "preparation work" (happening
>>> synchronously) lead to an unwanted effect?
>>>   - Are you using TTL for state in that operator?
>>>   - There were some changes made to support timers in RocksDB recently.
>>> Could that have something to do with it?
>>>
>>> Best,
>>> Stephan
>>>
>>>
>>> On Fri, Aug 30, 2019 at 2:45 PM Congxian Qiu 
>>> wrote:
>>>
 CC flink dev mail list
 Update for those who may be interested in this issue, we'are still
 diagnosing this problem currently.

 Best,
 Congxian


 Congxian Qiu  于2019年8月29日周四 下午8:58写道：

 > Hi Bekir
 >
 > Currently, from what we have diagnosed, there is some task complete
 its
 > checkpoint too late (maybe 15 mins), but we checked the kafka broker
 log
 > and did not find any interesting things there. could we run another
 job,
 > that did not commit offset to kafka, this wants to check if it is the
 > "commit offset to kafka" step consumes too much time.
 >
 > Best,
 > Congxian
 >
 >
 > Bekir Oguz  于2019年8月28日周三 下午4:19写道：
 >
 >> Hi Congxian,
 >> sorry for the late reply, but no progress on this issue yet. I
 checked
 >> also the kafka broker logs, found nothing interesting there.
 >> And we still have 15 min duration checkpoints quite often. Any more
 ideas
 >> on where to look at?
 >>
 >> Regards,
 >> Bekir
 >>
 >> On Fri, 23 Aug 2019 at 12:12, Congxian Qiu

[jira] [Created] (FLINK-13979) Translate new streamfilesink docs to chinese

2019-09-05 Thread Gyula Fora (Jira)

Gyula Fora created FLINK-13979:
--

 Summary: Translate new streamfilesink docs to chinese
 Key: FLINK-13979
 URL: https://issues.apache.org/jira/browse/FLINK-13979
 Project: Flink
  Issue Type: New Feature
  Components: chinese-translation, Documentation
Reporter: Gyula Fora


The StreamFileSink docs have been reworked as part of FLINK-13842



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-13978) Evaluate Azure Pipelines as a CI tool for Flink

2019-09-05 Thread Robert Metzger (Jira)

Robert Metzger created FLINK-13978:
--

 Summary: Evaluate Azure Pipelines as a CI tool for Flink
 Key: FLINK-13978
 URL: https://issues.apache.org/jira/browse/FLINK-13978
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Reporter: Robert Metzger
Assignee: Robert Metzger


See ML discussion: 
[https://lists.apache.org/thread.html/b90aa518fcabce94f8e1de4132f46120fae613db6e95a2705f1bd1ea@%3Cdev.flink.apache.org%3E]
 

We want to try out Azure Pipelines for the following reasons:
 * more mature system (compared to travis)
 * 10 parallel, 6 hrs builds for open source
 * ability to add custom machines

 

(See also INFRA-17030) 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[ANNOUNCE] Java 11 cron builds activated on master

2019-09-05 Thread Chesnay Schepler


Hello everyone,

I just wanted to inform everyone that we now run Java 11 builds on 
Travis as part of the cron jobs, subsuming the existing Java 9 tests. 
All existing Java 9 build/test infrastructure has been removed.


If you spot any test failures that appear to be specific to Java 11, 
please add a sub-task to FLINK-10725.


I would also encourage everyone to try out Java 11 for local development 
and usage, so that we can find pain points in the dev and user experience.

[jira] [Created] (FLINK-13977) Adapt HighAvailabilityServices#getWebMonitorLeaderElectionService to #getClusterRestEndpointLeaderRetriever

2019-09-05 Thread Till Rohrmann (Jira)

Till Rohrmann created FLINK-13977:
-

 Summary: Adapt 
HighAvailabilityServices#getWebMonitorLeaderElectionService to 
#getClusterRestEndpointLeaderRetriever
 Key: FLINK-13977
 URL: https://issues.apache.org/jira/browse/FLINK-13977
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 1.10.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 1.10.0


I propose to adapt 
{{HighAvailabilityServices#getWebMonitorLeaderElectionService}} to 
{{#getClusterRestEndpointLeaderRetriever}} by renaming this method to 
{{getClusterRestEndpointLeaderElectionService}}. That way it is consistent and 
not too confusing.

Concretely, I propose to deprecate the {{#getWebMonitorLeaderElectionService}} 
and introduce  {{#getClusterRestEndpointLeaderElectionService}} with a default 
implementation to fall back to {{#getWebMonitorLeaderElectionService}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-13976) Make moder kafka E2E test runnable on Java 11

2019-09-05 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-13976:


 Summary: Make moder kafka E2E test runnable on Java 11
 Key: FLINK-13976
 URL: https://issues.apache.org/jira/browse/FLINK-13976
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Kafka
Reporter: Chesnay Schepler
 Fix For: 1.10.0


The modern kafka E2E tests currently fails on Java 11. We should find a way to 
make this oe runnable at least.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-13975) Translate "Upcoming Events" on Chinese index.html

2019-09-05 Thread Fabian Hueske (Jira)

Fabian Hueske created FLINK-13975:
-

 Summary: Translate "Upcoming Events" on Chinese index.html
 Key: FLINK-13975
 URL: https://issues.apache.org/jira/browse/FLINK-13975
 Project: Flink
  Issue Type: Task
  Components: chinese-translation, Project Website
Reporter: Fabian Hueske


We recently added a section for "Upcoming Events" to the index page of the 
Flink website.

We need to translate "Upcoming Events" on the Chinese version of the main page.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Re: [DISCUSS] Reducing build times

2019-09-05 Thread Robert Metzger

I do have a working Azure setup, yes. E2E tests are not included in the
3.5hrs.

Yesterday, I became aware of a major blocker with Azure pipelines: Apache
Infra does not allow it to be integrated with Apache GitHub repositories,
because it requires write access (for a simple usability feature) [1]. This
means that we "have" to use CiBot for the time being.
I've also reached out to Microsoft to see if they can do anything about it.

+1 For setting it up with CiBot immediately.

[1]https://issues.apache.org/jira/browse/INFRA-17030

On Thu, Sep 5, 2019 at 11:04 AM Chesnay Schepler  wrote:

> I assume you already have a working (and verified) azure setup?
>
> Once we're running things on azure on the apache repo people will
> invariably use that as a source of truth because fancy check marks will
> yet again appear on commits. Hence I'm wary of running experiments here;
> I would prefer if we only activate it once things are confirmed to be
> working.
>
> For observation purposes, we could also add it to flink-ci with
> notifications to people who are interested in this experiment.
> This wouldn't impact CiBot.
>
> On 03/09/2019 18:57, Robert Metzger wrote:
> > Hi all,
> >
> > I wanted to give a short update on this:
> > - Arvid, Aljoscha and I have started working on a Gradle PoC, currently
> > working on making all modules compile and test with Gradle. We've also
> > identified some problematic areas (shading being the most obvious one)
> > which we will analyse as part of the PoC.
> > The goal is to see how much Gradle helps to parallelise our build, and to
> > avoid duplicate work (incremental builds).
> >
> > - I am working on setting up a Flink testing infrastructure based on
> Azure
> > Pipelines, using more powerful hardware. Alibaba kindly provided me with
> > two 32 core machines (temporarily), and another company reached out to
> > privately, looking into options for cheap, fast machines :)
> > If nobody in the community disagrees, I am going to set up Azure
> Pipelines
> > with our apache/flink GitHub as a build infrastructure that exists next
> to
> > Flinkbot and flink-ci. I would like to make sure that Azure Pipelines is
> > equally or even more reliable than Travis, and I want to see what the
> > required maintenance work is.
> > On top of that, Azure Pipelines is a very feature-rich tool with a lot of
> > nice options for us to improve the build experience (statistics about
> tests
> > (flaky tests etc.), nice docker support, plenty of free build resources
> for
> > open source projects, ...)
> >
> > Best,
> > Robert
> >
> >
> >
> >
> >
> > On Mon, Aug 19, 2019 at 5:12 PM Robert Metzger 
> wrote:
> >
> >> Hi all,
> >>
> >> I have summarized all arguments mentioned so far + some additional
> >> research into a Wiki page here:
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=125309279
> >>
> >> I'm happy to hear further comments on my summary! I'm pretty sure we can
> >> find more pro's and con's for the different options.
> >>
> >> My opinion after looking at the options:
> >>
> >> - Flink relies on an outdated build tool (Maven), while a good
> >> alternative is well-established (gradle), and will likely provide a
> much
> >> better CI and local build experience through incremental build and
> cached
> >> intermediates.
> >> Scripting around Maven, or splitting modules / test execution /
> >> repositories won't solve this problem. We should rather spend the
> effort in
> >> migrating to a modern build tool which will provide us benefits in
> the long
> >> run.
> >> - Flink relies on a fairly slow build service (Travis CI), while
> >> simply putting more money onto the problem could cut the build time
> at
> >> least in half.
> >> We should consider using a build service that provides bigger
> machines
> >> to solve our build time problem.
> >>
> >> My opinion is based on many assumptions (gradle is actually as fast as
> >> promised (haven't used it before), we can build Flink with gradle, we
> find
> >> sponsors for bigger build machines) that we need to test first through
> PoCs.
> >>
> >> Best,
> >> Robert
> >>
> >>
> >>
> >>
> >> On Mon, Aug 19, 2019 at 10:26 AM Aljoscha Krettek 
> >> wrote:
> >>
> >>> I did a quick test: a normal "mvn clean install -DskipTests
> >>> -Drat.skip=true -Dmaven.javadoc.skip=true -Punsafe-mapr-repo” on my
> machine
> >>> takes about 14 minutes. After removing all mentions of
> maven-shade-plugin
> >>> the build time goes down to roughly 11.5 minutes. (Obviously the
> resulting
> >>> Flink won’t work, because some expected stuff is not packaged and most
> of
> >>> the end-to-end tests use the shade plugin to package the jars for
> testing.
> >>>
> >>> Aljoscha
> >>>
>  On 18. Aug 2019, at 19:52, Robert Metzger 
> wrote:
> 
>  Hi all,
> 
>  I wanted to understand the impact of the hardware we are using for
> >>> running
>  our tests. Each travis worker has 2 virtual

[jira] [Created] (FLINK-13974) isAssignable function return wrong result

2019-09-05 Thread forideal (Jira)

forideal created FLINK-13974:


 Summary: isAssignable function return wrong result
 Key: FLINK-13974
 URL: https://issues.apache.org/jira/browse/FLINK-13974
 Project: Flink
  Issue Type: Bug
  Components: API / Type Serialization System
Affects Versions: 1.9.0
Reporter: forideal
 Attachments: image-2019-09-05-20-40-05-041.png

!image-2019-09-05-20-40-05-041.png!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Re: Re: [DISCUSS] Support JSON functions in Flink SQL

2019-09-05 Thread vino yang

+1 to have JSON functions in Flink SQL

JingsongLee  于2019年9月5日周四 下午4:46写道：

> +1
> Nice document. I think it is easier to do after expression reworking[1].
> By the way, which planner do you want to start?
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
>
> Best,
> Jingsong Lee
>
>
> --
> From:TANG Wen-hui 
> Send Time:2019年9月5日(星期四) 14:36
> To:dev 
> Subject:Re: Re: [DISCUSS] Support JSON functions in Flink SQL
>
> +1
> I have done similar work before.
> Looking forward to discussing this feature.
>
> Best
> wenhui
>
>
>
> winifred.wenhui.t...@gmail.com
>
> From: Kurt Young
> Date: 2019-09-05 14:00
> To: dev
> CC: Anyang Hu
> Subject: Re: [DISCUSS] Support JSON functions in Flink SQL
> +1 to add JSON support to Flink. We also see lots of requirements for JSON
> related functions in our internal platform. Since these are already SQL
> standard, I think it's a good time to add them to Flink.
>
> Best,
> Kurt
>
>
> On Thu, Sep 5, 2019 at 10:37 AM Qi Luo  wrote:
>
> > We also see strong demands from our SQL users for JSON/Date related
> > functions.
> >
> > Also +Anyang Hu 
> >
> > On Wed, Sep 4, 2019 at 9:51 PM Jark Wu  wrote:
> >
> > > Hi Forward,
> > >
> > > Thanks for bringing this discussion and preparing the nice design.
> > > I think it's nice to have the JSON functions in the next release.
> > > We have received some requirements for this feature.
> > >
> > > I can help to shepherd this JSON functions effort and will leave
> comments
> > >  in the design doc in the next days.
> > >
> > > Hi Danny,
> > >
> > > The new introduced JSON functions are from SQL:2016, not from MySQL.
> > > So there no JSON type is needed. According to the SQL:2016, the
> > > representation of JSON data can be "character string" which is also
> > > the current implementation in Calcite[1].
> > >
> > > Best,
> > > Jark
> > >
> > >
> > > [1]: https://calcite.apache.org/docs/reference.html#json-functions
> > >
> > >
> > > On Wed, 4 Sep 2019 at 21:22, Xu Forward 
> wrote:
> > >
> > > > hi Danny Chan ,Thank you very much for your reply, your help can help
> > me
> > > > further improve this discussion.
> > > > Best
> > > > forward
> > > >
> > > > Danny Chan  于2019年9月4日周三 下午8:50写道：
> > > >
> > > > > Thanks Xu Forward for bring up this topic, I think the JSON
> functions
> > > are
> > > > > very useful especially for those MySQL users.
> > > > >
> > > > > I saw that you have done some work within the Apache Calcite,
> that’s
> > a
> > > > > good start, but this is one concern from me, Flink doesn’t support
> > JSON
> > > > > type internal, so how to represent a JSON object in Flink maybe a
> key
> > > > point
> > > > > we need to resolve. In Calcite, we use ANY type to represent as the
> > > JSON,
> > > > > but I don’t think it is the right way to go, maybe we can have a
> > > > discussion
> > > > > here.
> > > > >
> > > > > Best,
> > > > > Danny Chan
> > > > > 在 2019年9月4日 +0800 PM8:34，Xu Forward ，写道：
> > > > > > Hi everybody,
> > > > > >
> > > > > > I'd like to kick off a discussion on Support JSON functions in
> > Flink
> > > > SQL.
> > > > > >
> > > > > > The entire plan is divided into two steps:
> > > > > > 1. Implement Support SQL 2016-2017 JSON functions in Flink
> SQL[1].
> > > > > > 2. Implement non-Support SQL 2016-2017 JSON functions in Flink
> SQL,
> > > > such
> > > > > as
> > > > > > JSON_TYPE in Mysql, JSON_LENGTH, etc. Very useful JSON functions.
> > > > > >
> > > > > > Would love to hear your thoughts.
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1JfaFYIFOAY8P2pFhOYNCQ9RTzwF4l85_bnTvImOLKMk/edit#heading=h.76mb88ca6yjp
> > > > > >
> > > > > > Best,
> > > > > > ForwardXu
> > > > >
> > > >
> > >
> >
>

Re: Fine grained batch recovery vs. native libraries

2019-09-05 Thread Fabian Hueske

Thanks for reporting the problem David!

Cheers,
Fabian

Am Mi., 4. Sept. 2019 um 14:09 Uhr schrieb David Morávek :

> Hi Chesnay, I've created FLINK-13958
>  to track the issue.
>
> Thanks,
> D.
>
> On Wed, Sep 4, 2019 at 1:56 PM Chesnay Schepler 
> wrote:
>
> > This sounds like a serious bug, please open a JIRA ticket.
> >
> > On 04/09/2019 13:41, David Morávek wrote:
> > > Hi,
> > >
> > > we're testing the newly released batch recovery and are running into
> > class
> > > loading related issues.
> > >
> > > 1) We have a per-job flink cluster
> > > 2) We use BATCH execution mode + region failover strategy
> > >
> > > Point 1) should imply single user code class loader per task manager
> > > (because there is only single pipeline, that reuses class loader cached
> > in
> > > BlobLibraryCacheManager). We need this property, because we have UDFs
> > that
> > > access C libraries using JNI (I think this may be fairly common
> use-case
> > > when dealing with legacy code). JDK internals
> > > <
> >
> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ClassLoader.java#L2466
> > >
> > > make sure that single library can be only loaded by a single class
> loader
> > > per JVM.
> > >
> > > When region recovery is triggered, vertices that need recover are first
> > > reset back to CREATED stated and then rescheduled. In case all tasks
> in a
> > > task manager are reset, this results in cached class loader being
> > released
> > > <
> >
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L338
> > >.
> > > This unfortunately causes job failure, because we try to reload a
> native
> > > library in a newly created class loader.
> > >
> > > I know that there is always possibility to distribute native libraries
> > with
> > > flink's libs and load it using system class loader, but this
> introduces a
> > > build & operations overhead and just make it really unfriendly for
> > cluster
> > > user, so I'd rather not work around the issue this way (per-job cluster
> > > should be more user friendly).
> > >
> > > I believe the correct approach would be not to release cached class
> > loader
> > > if the job is recovering, even though there are no tasks currently
> > > registered with TM.
> > >
> > > What do you think? Thanks for help.
> > >
> > > D.
> > >
> >
> >
>

Re: ClassLoader created by BlobLibraryCacheManager is not using context classloader

2019-09-05 Thread Jan Lukavský


Hi Till and Aljoscha,

I was investigating the other options, but unfortunately all of them 
look a little complicated (although possible, of course). But before 
going into a more complicated solutions, I'd like to know what issues do 
you actually see with using the context class loader. I can think of one 
difficulty - if (for whatever reason), the context class loader doesn't 
contain (in itself or as a parent) class loader that loaded flink core 
classes, that would probably cause troubles. So, what about a solution 
that we take as parent class loader of FlinkUserCodeClassLoaders a class 
loader that is:


 a) context class loader of current thread, if it either is actually 
class loader of flink core classes, or if it contains this class loader 
in its parent hierarchy, or


 b) class loader of flink core classes

That way, class loader of flink core classes would always be in parent 
hierarchy of FlinkUserCodeClassLoaders. Would that solve the issues you 
see? It works for me.


Jan

On 9/3/19 4:52 PM, Jan Lukavský wrote:

Answers inline.

On 9/3/19 4:01 PM, Till Rohrmann wrote:

How so? Does your REPL add the generated classes to the system class
loader? I assume the system class loader is used to load the Flink 
classes.
No, it does not. It cannot on JDK >= 9 (or would have to hack into 
jdk.internal.loader.ClassLoaders, which I don't want to :)). It just 
creates another class loader, and is able to create a jar from 
generated files. The jar is used for remote execution.


Ideally, what you would like to have is the option to provide the parent
class loader which is used load user code to the LocalEnvironment. 
This one

could then be forwarded to the TaskExecutor where it is used to generate
the user code class loader. But this is a bigger effort.
I'm not sure how this differs from using context classloader? Maybe 
there is subtle difference in that this is a little more explicit. On 
the other hand, users normally do not modify class loaders, so the 
practical impact is IMHO negligible. But maybe this opens another 
possibility - we probably could add optional ClassLoader parameter to 
LocalEnvironment, with default value of 
FlinkRunner.class.getClassLoader()? That might be a good compromise.


The downside to this approach is that it requires you to create a jar 
file

and to submit it via a REST call. The upside is that it is closer to the
production setting.


Yes, a REPL has to do that anyway to support distributed computing, so 
this is not an issue.


Jan



Cheers,
Till

On Tue, Sep 3, 2019 at 3:47 PM Jan Lukavský  wrote:


On the other hand, if you say, that the contract of LocalEnvironment is
to execute as if it had all classes on its class loader, then it
currently breaks this contract. :-)

Jan

On 9/3/19 3:45 PM, Jan Lukavský wrote:

Hi Till,

hmm, that sounds it might work. I would have to incorporate this
(either as default, or on demand) into Apache Beam. Would you see any
disadvantages of this approach? Would you suggest to make this default
behavior for local beam FlinkRunner? I can introduce a configuration
option to turn this behavior on, but that would bring additional
maintenance burden, etc., etc.

Jan

On 9/3/19 3:38 PM, Till Rohrmann wrote:

I see the problem Jan. What about the following proposal: Instead of
using
the LocalEnvironment for local tests you always use the
RemoteEnvironment
but when testing it locally you spin up a MiniCluster in the same
process
and initialize the RemoteEnvironment with 
`MiniCluster#getRestAddress`.
That way you would always submit a jar with the generated classes 
and,

hence, not having to set the context class loader.

The contract of the LocalEnvironment is indeed that all classes it is
supposed t execute must be present when being started.

Cheers,
Till

On Tue, Sep 3, 2019 at 2:27 PM guaishushu1...@163.com <
guaishushu1...@163.com> wrote:




guaishushu1...@163.com

From: guaishushu1...@163.com
Date: 2019-09-03 20:25
To: dev
Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager 
is not

using context classloader




guaishushu1...@163.com
From: guaishushu1...@163.com
Date: 2019-09-03 20:23
To: dev
Subject: Re: Re: ClassLoader created by BlobLibraryCacheManager 
is not

using context classloader
guaishushu1...@163.com
From: Jan Lukavský
Date: 2019-09-03 20:17
To: dev
Subject: Re: ClassLoader created by BlobLibraryCacheManager is not
using
context classloader
Hi Till,
the use-case is pretty much simple - I have a REPL shell in groovy,
which generates classes at runtime. The actual hierarchy is 
therefore

system class loader -> application classloader -> repl classloader
(GroovyClassLoader actually)
now, when a terminal (sink) operation in the shell occurs, I'm 
able to

build a jar, which I can submit to remote cluster (in distributed
case).
But - during testing -  I run the code using local flink. There 
is no
(legal) way of adding this new runtime generated jar to local 
flink. As
I said, I have a hackish solution

Re: [DISCUSS] Flink Python User-Defined Function for Table API

2019-09-05 Thread jincheng sun

Hi Aljoscha,

Thanks for your comments!

Regarding to the FLIP scope, it seems that we have agreed on the design of
the stateless function support.
What do you think about starting the development of the stateless function
support firstly and continue the discussion of stateful function support?
Or you think we should split the current FLIP into two FLIPs and discuss
the stateful function support in another thread?

Currently, the Python DataView/MapView/ListView interfaces design follow
the Java/Scala naming conversions.
Of couse, We can continue to discuss whether there are better solutions,
i.e. using annotations.

Regarding to the magic logic to support DataView/MapView/ListView, it will
be done by the framework and is transparent for users.
Per my understanding, the magic logic is unavoidable no matter what the
interfaces will be.

Regarding to the catalog support of python function:1) If it's stored in
memory as temporary object, just as you said, users can call
TableEnvironment.register_function(will change to
register_temporary_function in FLIP-64)
2) If it's persisted in external storage, users can call
Catalog.create_function. There will be no API change per my understanding.

What do you think?
Best,Jincheng

Aljoscha Krettek  于2019年9月5日周四 下午5:32写道：

> Hi,
>
> Another thing to consider is the Scope of the FLIP. Currently, we try to
> support (stateful) AggregateFunctions. I have some concerns about whether
> or not DataView/MapView/ListView is a good interface because it requires
> quite some magic from the runners to make it work, such as messing with the
> TypeInformation and injecting objects at runtime. If the FLIP aims for the
> minimum of ScalarFunctions and the whole execution harness, that should be
> easier to agree on.
>
> Another point is the naming of the new methods. I think Timo hinted at the
> fact that we have to consider catalog support for functions. There is
> ongoing work about differentiating between temporary objects and objects
> that are stored in a catalog (FLIP-64 [1]). With this in mind, the method
> for registering functions should be called register_temporary_function()
> and so on. Unless we want to already think about mixing Python and Java
> functions in the catalog, which is outside the scope of this FLIP, I think.
>
> Best,
> Aljoscha
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>
>
> > On 5. Sep 2019, at 05:01, jincheng sun  wrote:
> >
> > Hi Aljoscha,
> >
> > That's a good points, so far, most of the code will live in flink-python
> > module, and the rules and relNodes will be put into the both blink and
> > flink planner modules, some of the common interface of required by
> planners
> > will be placed in flink-table-common. I think you are right, we should
> try
> > to ensure the changes of this feature is minimal.  For more detail we
> would
> > follow this principle when review the PRs.
> >
> > Great thanks for your questions and remind!
> >
> > Best,
> > Jincheng
> >
> >
> > Aljoscha Krettek  于2019年9月4日周三 下午8:58写道：
> >
> >> Hi,
> >>
> >> Things looks interesting so far!
> >>
> >> I had one question: Where will most of the support code for this live?
> >> Will this add the required code to flink-table-common or the different
> >> runners? Can we implement this in such a way that only a minimal amount
> of
> >> support code is required in the parts of the Table API (and Table API
> >> runners) that  are not python specific?
> >>
> >> Best,
> >> Aljoscha
> >>
> >>> On 4. Sep 2019, at 14:14, Timo Walther  wrote:
> >>>
> >>> Hi Jincheng,
> >>>
> >>> 2. Serializability of functions: "#2 is very convenient for users"
> means
> >> only until they have the first backwards-compatibility issue, after that
> >> they will find it not so convinient anymore and will ask why the
> framework
> >> allowed storing such objects in a persistent storage. I don't want to be
> >> picky about it, but wanted to raise awareness that sometimes it is ok to
> >> limit use cases to guide users for devloping backwards-compatible
> programs.
> >>>
> >>> Thanks for the explanation fo the remaining items. It sounds reasonable
> >> to me. Regarding the example with `getKind()`, I actually meant
> >> `org.apache.flink.table.functions.ScalarFunction#getKind` we don't allow
> >> users to override this property. And I think we should do something
> similar
> >> for the getLanguage property.
> >>>
> >>> Thanks,
> >>> Timo
> >>>
> >>> On 03.09.19 15:01, jincheng sun wrote:
>  Hi Timo,
> 
>  Thanks for the quick reply ! :)
>  I have added more example for #3 and #5 to the FLIP. That are great
>  suggestions !
> 
>  Regarding 2:
> 
>  There are two kind Serialization for CloudPickle(Which is different
> from
>  Java):
>  1) For class and function which can be imported, CloudPickle only
>  serialize the full path of the class and function (just like java
> class
>  name).
>

Re: Please add me as contributor

2019-09-05 Thread Fabian Hueske

Hi Jan,

I assigned FLINK-13925 to you.
Thanks for your contribution!

Best, Fabian

Am Di., 3. Sept. 2019 um 10:52 Uhr schrieb Jan Lukavský :

> Hi Dawid,
>
> thanks for the explanation. I got warning from PR [1] associated with
> JIRA [2].
>
> Jan
>
> [1] https://github.com/apache/flink/pull/9579
>
> [2] https://issues.apache.org/jira/browse/FLINK-13925
>
> On 9/3/19 10:43 AM, Dawid Wysakowicz wrote:
> > Hi Jan,
> >
> > Recently the community changed the contribution process a bit and there
> > are no longer contributor privileges. The jira issues are supposed to be
> > assigned by committers that are willing to help you with getting the
> > contribution in. Please look at the contribution guidelines[1]. Do you
> > have some particular jira ticket in mind that you are interested in
> > working on?
> >
> > Best,
> >
> > Dawid
> >
> >
> > [1] https://flink.apache.org/contributing/contribute-code.html
> >
> > On 03/09/2019 10:18, Jan Lukavský wrote:
> >> Hi,
> >>
> >> I'd like to be able to assign JIRAs to myself, can I be added as
> >> contributor, please? My JIRA ID is 'janl'.
> >>
> >> Thanks,
> >>
> >>   Jan
> >>
>

Re: [DISCUSS] Contribute Pulsar Flink connector back to Flink

2019-09-05 Thread Becket Qin

Hi Till,

You are right. It all depends on when the new source interface is going to
be ready. Personally I think it would be there in about a month or so. But
I could be too optimistic. It would also be good to hear what do Aljoscha
and Stephan think as they are also involved in FLIP-27.

In general I think we should have Pulsar connector in Flink 1.10,
preferably with the new source interface. We can also check it in right now
with old source interface, but I suspect few users will use it before the
next official release. Therefore, it seems reasonable to wait a little bit
to see whether we can jump to the new source interface. As long as we make
sure Flink 1.10 has it, waiting a little bit doesn't seem to hurt much.

Thanks,

Jiangjie (Becket) Qin

On Thu, Sep 5, 2019 at 3:59 PM Till Rohrmann  wrote:

> Hi everyone,
>
> I'm wondering what the problem would be if we committed the Pulsar
> connector before the new source interface is ready. If I understood it
> correctly, then we need to support the old source interface anyway for the
> existing connectors. By checking it in early I could see the benefit that
> our users could start using the connector earlier. Moreover, it would
> prevent that the Pulsar integration is being delayed in case that the
> source interface should be delayed. The only downside I see is the extra
> review effort and potential fixes which might be irrelevant for the new
> source interface implementation. I guess it mainly depends on how certain
> we are when the new source interface will be ready.
>
> Cheers,
> Till
>
> On Thu, Sep 5, 2019 at 8:56 AM Becket Qin  wrote:
>
> > Hi Sijie and Yijie,
> >
> > Thanks for sharing your thoughts.
> >
> > Just want to have some update on FLIP-27. Although the FLIP wiki and
> > discussion thread has been quiet for some time, a few committer /
> > contributors in Flink community were actually prototyping the entire
> thing.
> > We have made some good progress there but want to update the FLIP wiki
> > after the entire thing is verified to work in case there are some last
> > minute surprise in the implementation. I don't have an exact ETA yet,
> but I
> > guess it is going to be within a month or so.
> >
> > I am happy to review the current Flink Pulsar connector and see if it
> would
> > fit in FLIP-27. It would be good to avoid the case that we checked in the
> > Pulsar connector with some review efforts and shortly after that the new
> > Source interface is ready.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Thu, Sep 5, 2019 at 8:39 AM Yijie Shen 
> > wrote:
> >
> > > Thanks for all the feedback and suggestions!
> > >
> > > As Sijie said, the goal of the connector has always been to provide
> > > users with the latest features of both systems as soon as possible. We
> > > propose to contribute the connector to Flink and hope to get more
> > > suggestions and feedback from Flink experts to ensure the high quality
> > > of the connector.
> > >
> > > For FLIP-27, we noticed its existence at the beginning of reworking
> > > the connector implementation based on Flink 1.9; we also wanted to
> > > build a connector that supports both batch and stream computing based
> > > on it.
> > > However, it has been inactive for some time, so we decided to provide
> > > a connector with most of the new features, such as the new type system
> > > and the new catalog API first. We will pay attention to the progress
> > > of FLIP-27 continually and incorporate it with the connector as soon
> > > as possible.
> > >
> > > Regarding the test status of the connector, we are following the other
> > > connectors' test in Flink repository and aimed to provide throughout
> > > tests as we could. We are also happy to hear suggestions and
> > > supervision from the Flink community to improve the stability and
> > > performance of the connector continuously.
> > >
> > > Best,
> > > Yijie
> > >
> > > On Thu, Sep 5, 2019 at 5:59 AM Sijie Guo  wrote:
> > > >
> > > > Thanks everyone for the comments and feedback.
> > > >
> > > > It seems to me that the main question here is about - "how can the
> > Flink
> > > > community maintain the connector?".
> > > >
> > > > Here are two thoughts from myself.
> > > >
> > > > 1) I think how and where to host this integration is kind of less
> > > important
> > > > here. I believe there can be many ways to achieve it.
> > > > As part of the contribution, what we are looking for here is how
> these
> > > two
> > > > communities can build the collaboration relationship on developing
> > > > the integration between Pulsar and Flink. Even we can try our best to
> > > catch
> > > > up all the updates in Flink community. We are still
> > > > facing the fact that we have less experiences in Flink than folks in
> > > Flink
> > > > community. In order to make sure we maintain and deliver
> > > > a high-quality pulsar-flink integration to the users who use both
> > > > technologies, we need some help from the experts from Flink
>

[jira] [Created] (FLINK-13972) Move PreviewPlanEnvironment to flink-tests

2019-09-05 Thread Kostas Kloudas (Jira)

Kostas Kloudas created FLINK-13972:
--

 Summary: Move PreviewPlanEnvironment to flink-tests
 Key: FLINK-13972
 URL: https://issues.apache.org/jira/browse/FLINK-13972
 Project: Flink
  Issue Type: Sub-task
  Components: Client / Job Submission
Affects Versions: 1.9.0
Reporter: Kostas Kloudas
Assignee: Kostas Kloudas


The PreviewPlanEnvironment is already used only in tests. This issue simply 
moves it in the package where it is used.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Re: Storing offsets in Kafka

2019-09-05 Thread Becket Qin

No, I don't think so.

As long as you have a successful checkpoint, The offset will be committed.

Thanks,

Jiangjie (Becket) Qin

On Thu, Sep 5, 2019 at 4:56 PM Dominik Wosiński  wrote:

> Hey,
> Yeah I am using the first case. Is there a specific requirement for
> checkpoints ? Like do they need to be externalized or so ?
>
>
> Best,
> Dom.
>
> czw., 5 wrz 2019 o 05:32 Becket Qin  napisał(a):
>
> > Hi Dominik,
> >
> > There has not been any change to the offset committing logic in
> > KafkaConsumer for a while. But the logic is a little complicated. The
> > offset commit to Kafka is only enabled in the following two cases:
> >
> > 1. Flink checkpoint is enabled AND commitOffsetsOnCheckpoint is true
> > (default value is true)
> > 2. Flink checkpoint is disabled AND the vanilla KafkaConsumer has a)
> > enable.auto.commit=true (default value is true); b)
> > auto.commit.interval.ms>0
> > (default value is 5000).
> >
> > Note that in case 1, if the job exits before the first checkpoint takes
> > place, then there will be no offset committed.
> >
> > Can you check if your setting falls in one of the two cases?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> >
> > On Wed, Sep 4, 2019 at 9:03 PM Dominik Wosiński 
> wrote:
> >
> > > Hey,
> > > I was wondering whether something has changed for KafkaConsumer, since
> I
> > am
> > > using Kafka 2.0.0 with Flink and I wanted to use group offsets but
> there
> > > seems to be no change in the topic where Kafka stores it's offsets,
> after
> > > restart Flink uses the `auto.offset.reset` so it seems that there is no
> > > offsets commit happening. The checkpoints are properly configured and I
> > am
> > > able to restore with Savepoint. But the group offsets are not working
> > > properly. It there anything that has changed in this manner ?
> > >
> > > Best Regards,
> > > Dom.
> > >
> >
>

Re: [DISCUSS] Flink Python User-Defined Function for Table API

2019-09-05 Thread Aljoscha Krettek

Hi,

Another thing to consider is the Scope of the FLIP. Currently, we try to 
support (stateful) AggregateFunctions. I have some concerns about whether or 
not DataView/MapView/ListView is a good interface because it requires quite 
some magic from the runners to make it work, such as messing with the 
TypeInformation and injecting objects at runtime. If the FLIP aims for the 
minimum of ScalarFunctions and the whole execution harness, that should be 
easier to agree on.

Another point is the naming of the new methods. I think Timo hinted at the fact 
that we have to consider catalog support for functions. There is ongoing work 
about differentiating between temporary objects and objects that are stored in 
a catalog (FLIP-64 [1]). With this in mind, the method for registering 
functions should be called register_temporary_function() and so on. Unless we 
want to already think about mixing Python and Java functions in the catalog, 
which is outside the scope of this FLIP, I think.

Best,
Aljoscha

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module


> On 5. Sep 2019, at 05:01, jincheng sun  wrote:
> 
> Hi Aljoscha,
> 
> That's a good points, so far, most of the code will live in flink-python
> module, and the rules and relNodes will be put into the both blink and
> flink planner modules, some of the common interface of required by planners
> will be placed in flink-table-common. I think you are right, we should try
> to ensure the changes of this feature is minimal.  For more detail we would
> follow this principle when review the PRs.
> 
> Great thanks for your questions and remind!
> 
> Best,
> Jincheng
> 
> 
> Aljoscha Krettek  于2019年9月4日周三 下午8:58写道：
> 
>> Hi,
>> 
>> Things looks interesting so far!
>> 
>> I had one question: Where will most of the support code for this live?
>> Will this add the required code to flink-table-common or the different
>> runners? Can we implement this in such a way that only a minimal amount of
>> support code is required in the parts of the Table API (and Table API
>> runners) that  are not python specific?
>> 
>> Best,
>> Aljoscha
>> 
>>> On 4. Sep 2019, at 14:14, Timo Walther  wrote:
>>> 
>>> Hi Jincheng,
>>> 
>>> 2. Serializability of functions: "#2 is very convenient for users" means
>> only until they have the first backwards-compatibility issue, after that
>> they will find it not so convinient anymore and will ask why the framework
>> allowed storing such objects in a persistent storage. I don't want to be
>> picky about it, but wanted to raise awareness that sometimes it is ok to
>> limit use cases to guide users for devloping backwards-compatible programs.
>>> 
>>> Thanks for the explanation fo the remaining items. It sounds reasonable
>> to me. Regarding the example with `getKind()`, I actually meant
>> `org.apache.flink.table.functions.ScalarFunction#getKind` we don't allow
>> users to override this property. And I think we should do something similar
>> for the getLanguage property.
>>> 
>>> Thanks,
>>> Timo
>>> 
>>> On 03.09.19 15:01, jincheng sun wrote:
 Hi Timo,
 
 Thanks for the quick reply ! :)
 I have added more example for #3 and #5 to the FLIP. That are great
 suggestions !
 
 Regarding 2:
 
 There are two kind Serialization for CloudPickle(Which is different from
 Java):
 1) For class and function which can be imported, CloudPickle only
 serialize the full path of the class and function (just like java class
 name).
 2) For the class and function which can not be imported, CloudPickle
>> will
 serialize the full content of the class and function.
 For #2, It means that we can not just store the full path of the class
>> and
 function.
 
 The above serialization is recursive.
 
 However, there is indeed an problem of backwards compatibility when the
 module path of the parent class changed. But I think this is an rare
>> case
 and acceptable. i.e., For Flink framework we never change the user
 interface module path if we want to keep backwards compatibility. For
>> user
 code, if they change the interface of UDF's parent, they should
>> re-register
 their functions.
 
 If we do not want support #2, we can store the full path of class and
 function, in that case we have no backwards compatibility problem. But I
 think the #2 is very convenient for users.
 
 What do you think?
 
 Regarding 4:
 As I mentioned earlier, there may be built-in Python functions and I
>> think
 language is a "function" concept. Function and Language are orthogonal
 concepts.
 We may have R, GO and other language functions in the future, not only
 user-defined, but also built-in functions.
 
 You are right that users will not set this method and for Python
>> functions,
 it will be set in the code-generated Java function by the framework.
>> So, I

[jira] [Created] (FLINK-13971) Add TaskManager In org.apache.flink.runtime.rest.messages.JobVertexTaskManagersInfo.TaskManagersInfo

2019-09-05 Thread lining (Jira)

lining created FLINK-13971:
--

 Summary: Add TaskManager In 
org.apache.flink.runtime.rest.messages.JobVertexTaskManagersInfo.TaskManagersInfo
 Key: FLINK-13971
 URL: https://issues.apache.org/jira/browse/FLINK-13971
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / REST
Reporter: lining


Add Taskmanager id in 
org.apache.flink.runtime.rest.messages.JobVertexTaskManagersInfo.TaskManagersInfo,
 then we could from here to dashbord of Taskmanager



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-13970) Remove or move LifoSetQueue

2019-09-05 Thread TisonKun (Jira)

TisonKun created FLINK-13970:


 Summary: Remove or move LifoSetQueue
 Key: FLINK-13970
 URL: https://issues.apache.org/jira/browse/FLINK-13970
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Coordination
Affects Versions: 1.10.0
Reporter: TisonKun
 Fix For: 1.10.0


Hi [~till.rohrmann] I found a class {{LifoSetQueue}} which is not into used any 
more. IIRC it was ever used in {{Scheduler}} and {{Instance}}. Shall we remove 
this class also or put it under some directory collects utils?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Re: [DISCUSS] Reducing build times

2019-09-05 Thread Chesnay Schepler


I assume you already have a working (and verified) azure setup?

Once we're running things on azure on the apache repo people will 
invariably use that as a source of truth because fancy check marks will 
yet again appear on commits. Hence I'm wary of running experiments here; 
I would prefer if we only activate it once things are confirmed to be 
working.


For observation purposes, we could also add it to flink-ci with 
notifications to people who are interested in this experiment.

This wouldn't impact CiBot.

On 03/09/2019 18:57, Robert Metzger wrote:

Hi all,

I wanted to give a short update on this:
- Arvid, Aljoscha and I have started working on a Gradle PoC, currently
working on making all modules compile and test with Gradle. We've also
identified some problematic areas (shading being the most obvious one)
which we will analyse as part of the PoC.
The goal is to see how much Gradle helps to parallelise our build, and to
avoid duplicate work (incremental builds).

- I am working on setting up a Flink testing infrastructure based on Azure
Pipelines, using more powerful hardware. Alibaba kindly provided me with
two 32 core machines (temporarily), and another company reached out to
privately, looking into options for cheap, fast machines :)
If nobody in the community disagrees, I am going to set up Azure Pipelines
with our apache/flink GitHub as a build infrastructure that exists next to
Flinkbot and flink-ci. I would like to make sure that Azure Pipelines is
equally or even more reliable than Travis, and I want to see what the
required maintenance work is.
On top of that, Azure Pipelines is a very feature-rich tool with a lot of
nice options for us to improve the build experience (statistics about tests
(flaky tests etc.), nice docker support, plenty of free build resources for
open source projects, ...)

Best,
Robert





On Mon, Aug 19, 2019 at 5:12 PM Robert Metzger  wrote:


Hi all,

I have summarized all arguments mentioned so far + some additional
research into a Wiki page here:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=125309279

I'm happy to hear further comments on my summary! I'm pretty sure we can
find more pro's and con's for the different options.

My opinion after looking at the options:

- Flink relies on an outdated build tool (Maven), while a good
alternative is well-established (gradle), and will likely provide a much
better CI and local build experience through incremental build and cached
intermediates.
Scripting around Maven, or splitting modules / test execution /
repositories won't solve this problem. We should rather spend the effort in
migrating to a modern build tool which will provide us benefits in the long
run.
- Flink relies on a fairly slow build service (Travis CI), while
simply putting more money onto the problem could cut the build time at
least in half.
We should consider using a build service that provides bigger machines
to solve our build time problem.

My opinion is based on many assumptions (gradle is actually as fast as
promised (haven't used it before), we can build Flink with gradle, we find
sponsors for bigger build machines) that we need to test first through PoCs.

Best,
Robert




On Mon, Aug 19, 2019 at 10:26 AM Aljoscha Krettek 
wrote:


I did a quick test: a normal "mvn clean install -DskipTests
-Drat.skip=true -Dmaven.javadoc.skip=true -Punsafe-mapr-repo” on my machine
takes about 14 minutes. After removing all mentions of maven-shade-plugin
the build time goes down to roughly 11.5 minutes. (Obviously the resulting
Flink won’t work, because some expected stuff is not packaged and most of
the end-to-end tests use the shade plugin to package the jars for testing.

Aljoscha


On 18. Aug 2019, at 19:52, Robert Metzger  wrote:

Hi all,

I wanted to understand the impact of the hardware we are using for

running

our tests. Each travis worker has 2 virtual cores, and 7.5 gb memory

[1].

They are using Google Cloud Compute Engine *n1-standard-2* instances.
Running a full "mvn clean verify" takes *03:32 h* on such a machine

type.

Running the same workload on a 32 virtual cores, 64 gb machine, takes

*1:21

h*.

What is interesting are the per-module build time differences.
Modules which are parallelizing tests well greatly benefit from the
additional cores:
"flink-tests" 36:51 min vs 4:33 min
"flink-runtime" 23:41 min vs 3:47 min
"flink-table-planner" 15:54 min vs 3:13 min

On the other hand, we have modules which are not parallel at all:
"flink-connector-kafka": 16:32 min vs 15:19 min
"flink-connector-kafka-0.11": 9:52 min vs 7:46 min
Also, the checkstyle plugin is not scaling at all.

Chesnay reported some significant speedups by reusing forks.
I don't know how much effort it would be to make the Kafka tests
parallelizable. In total, they currently use 30 minutes on the big

machine

(while 31 CPUs are idling :) )

Let me know what you think about these results. If the

Re: Storing offsets in Kafka

2019-09-05 Thread Dominik Wosiński

Hey,
Yeah I am using the first case. Is there a specific requirement for
checkpoints ? Like do they need to be externalized or so ?


Best,
Dom.

czw., 5 wrz 2019 o 05:32 Becket Qin  napisał(a):

> Hi Dominik,
>
> There has not been any change to the offset committing logic in
> KafkaConsumer for a while. But the logic is a little complicated. The
> offset commit to Kafka is only enabled in the following two cases:
>
> 1. Flink checkpoint is enabled AND commitOffsetsOnCheckpoint is true
> (default value is true)
> 2. Flink checkpoint is disabled AND the vanilla KafkaConsumer has a)
> enable.auto.commit=true (default value is true); b)
> auto.commit.interval.ms>0
> (default value is 5000).
>
> Note that in case 1, if the job exits before the first checkpoint takes
> place, then there will be no offset committed.
>
> Can you check if your setting falls in one of the two cases?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
>
> On Wed, Sep 4, 2019 at 9:03 PM Dominik Wosiński  wrote:
>
> > Hey,
> > I was wondering whether something has changed for KafkaConsumer, since I
> am
> > using Kafka 2.0.0 with Flink and I wanted to use group offsets but there
> > seems to be no change in the topic where Kafka stores it's offsets, after
> > restart Flink uses the `auto.offset.reset` so it seems that there is no
> > offsets commit happening. The checkpoints are properly configured and I
> am
> > able to restore with Savepoint. But the group offsets are not working
> > properly. It there anything that has changed in this manner ?
> >
> > Best Regards,
> > Dom.
> >
>

Re: Re: [DISCUSS] Support JSON functions in Flink SQL

2019-09-05 Thread JingsongLee

+1
Nice document. I think it is easier to do after expression reworking[1].
By the way, which planner do you want to start?

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design

Best,
Jingsong Lee


--
From:TANG Wen-hui 
Send Time:2019年9月5日(星期四) 14:36
To:dev 
Subject:Re: Re: [DISCUSS] Support JSON functions in Flink SQL

+1 
I have done similar work before. 
Looking forward to discussing this feature.

Best
wenhui



winifred.wenhui.t...@gmail.com

From: Kurt Young
Date: 2019-09-05 14:00
To: dev
CC: Anyang Hu
Subject: Re: [DISCUSS] Support JSON functions in Flink SQL
+1 to add JSON support to Flink. We also see lots of requirements for JSON
related functions in our internal platform. Since these are already SQL
standard, I think it's a good time to add them to Flink.

Best,
Kurt


On Thu, Sep 5, 2019 at 10:37 AM Qi Luo  wrote:

> We also see strong demands from our SQL users for JSON/Date related
> functions.
>
> Also +Anyang Hu 
>
> On Wed, Sep 4, 2019 at 9:51 PM Jark Wu  wrote:
>
> > Hi Forward,
> >
> > Thanks for bringing this discussion and preparing the nice design.
> > I think it's nice to have the JSON functions in the next release.
> > We have received some requirements for this feature.
> >
> > I can help to shepherd this JSON functions effort and will leave comments
> >  in the design doc in the next days.
> >
> > Hi Danny,
> >
> > The new introduced JSON functions are from SQL:2016, not from MySQL.
> > So there no JSON type is needed. According to the SQL:2016, the
> > representation of JSON data can be "character string" which is also
> > the current implementation in Calcite[1].
> >
> > Best,
> > Jark
> >
> >
> > [1]: https://calcite.apache.org/docs/reference.html#json-functions
> >
> >
> > On Wed, 4 Sep 2019 at 21:22, Xu Forward  wrote:
> >
> > > hi Danny Chan ,Thank you very much for your reply, your help can help
> me
> > > further improve this discussion.
> > > Best
> > > forward
> > >
> > > Danny Chan  于2019年9月4日周三 下午8:50写道：
> > >
> > > > Thanks Xu Forward for bring up this topic, I think the JSON functions
> > are
> > > > very useful especially for those MySQL users.
> > > >
> > > > I saw that you have done some work within the Apache Calcite, that’s
> a
> > > > good start, but this is one concern from me, Flink doesn’t support
> JSON
> > > > type internal, so how to represent a JSON object in Flink maybe a key
> > > point
> > > > we need to resolve. In Calcite, we use ANY type to represent as the
> > JSON,
> > > > but I don’t think it is the right way to go, maybe we can have a
> > > discussion
> > > > here.
> > > >
> > > > Best,
> > > > Danny Chan
> > > > 在 2019年9月4日 +0800 PM8:34，Xu Forward ，写道：
> > > > > Hi everybody,
> > > > >
> > > > > I'd like to kick off a discussion on Support JSON functions in
> Flink
> > > SQL.
> > > > >
> > > > > The entire plan is divided into two steps:
> > > > > 1. Implement Support SQL 2016-2017 JSON functions in Flink SQL[1].
> > > > > 2. Implement non-Support SQL 2016-2017 JSON functions in Flink SQL,
> > > such
> > > > as
> > > > > JSON_TYPE in Mysql, JSON_LENGTH, etc. Very useful JSON functions.
> > > > >
> > > > > Would love to hear your thoughts.
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1JfaFYIFOAY8P2pFhOYNCQ9RTzwF4l85_bnTvImOLKMk/edit#heading=h.76mb88ca6yjp
> > > > >
> > > > > Best,
> > > > > ForwardXu
> > > >
> > >
> >
>

Re: [DISCUSS] Adding configure and close methods to the Kafka(De)SerializationSchema interfaces

2019-09-05 Thread Gyula Fóra

Hi Arvid,

The ConfluentRegistryAvroDeserializationSchema uses a checkAvroInitialized()
call for every single record to initialize the schema for the first time.
This is clearly an indication of a missing open/configure method. In
addition some of the Kafka serializers rely on properties that are usually
passed together with the Kafka configuration. Adding a configure method
that gets the kafka properties provides a familiar way of implementing it
without having to pass properties twice. (Once for the Producer/Consumer
and once for the schema).

The example you mentioned doesn't implement any closing logic. Imegine if
the schema registry would have created a background thread to fetch data
and would have to be closed. There is no way to do that now.
The confluent schema registry doesnt work this way, but other registries
might.

I hope this answers your question.

Gyula

On Thu, Sep 5, 2019 at 10:01 AM Arvid Heise  wrote:

> Hi Gyula,
>
> when looking at the ConfluentRegistryAvroDeserializationSchema [1], it
> seems like the intended way is to pass all configuration parameters in the
> constructor. So you could call open there.
>
> Could you please line out in more details why this is not enough? What
> would you do in open and close respectively?
>
> Best,
>
> Arvid
>
> [1]
>
> https://github.com/apache/flink/blob/master/flink-formats/flink-avro-confluent-registry/src/main/java/org/apache/flink/formats/avro/registry/confluent/ConfluentRegistryAvroDeserializationSchema.java
>
> On Thu, Sep 5, 2019 at 9:43 AM Gyula Fóra  wrote:
>
> > Hi all!
> >
> > While implementing a new custom flink serialization schema that wraps an
> > existing Kafka serializer, I realized we are missing 2 key methods that
> > could be easily added:
> >
> > void configure(java.util.Map configs);
> > void close();
> >
> > We could rename configure to open but Kafka serializers have a configure
> > method.
> > The configure method would be called when the operator start with the
> > provided kafka properties and the close when it shuts down.
> >
> > Currently there is no way to access the properties from the schema
> > interfaces or close the schema on failure.
> >
> > This would be a very simple addition and could be added as optional
> methods
> > to the interface to not break any schemas that are implemented as
> lambdas.
> >
> > What do you think?
> >
> > Gyula
> >
>
>
> --
>
> Arvid Heise | Senior Software Engineer
>
> 
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward  - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-05 Thread Xuefu Z

Hi David,

Thanks for sharing your thoughts and  request for clarifications. I believe
that I fully understood your proposal, which does has its merit. However,
it's different from ours. Here are the answers to your questions:

Re #1: yes, the temp functions in the proposal are global and have just
one-part names, similar to built-in functions. Two- or three-part names are
not allowed.

Re #2: not applicable as two- or three-part names are disallowed.

Re #3: same as above. Referencing external built-in functions is achieved
either implicitly (only the built-in functions in the current catalogs are
considered) or via special syntax such as cat::function. However, we are
looking into the modular approach that Time suggested with other feedback
received from the community.

Re #4: the resolution order goes like the following in our proposal:

1. temporary functions
2. bulit-in functions (including those augmented by add-on modules)
3. built-in functions in current catalog (this will not be needed if the
special syntax "cat::function" is required)
4. functions in current catalog and db.

If we go with the modular approach and make external built-in functions as
an add-on module, the 2 and 3 above will be combined. In essence, the
resolution order is equivalent in the two approaches.

By the way, resolution order matters only for simple name reference. For
names such as db.function (interpreted as current_cat/db/function) or
cat.db.function, the reference is unambiguous, so on resolution is needed.

As it can be seen, the proposed concept regarding temp function and
function resolution is quite simple. Additionally, the proposed resolution
order allows temp function to shadow a built-in function, which is
important (though not decisive) in our opinion.

I started liking the modular approach as the resolution order will only
include 1, 2, and 4, which is simpler and more generic. That's why I
suggested we look more into this direction.

Please let me know if there are further questions.

Thanks,
Xuefu

On Thu, Sep 5, 2019 at 2:42 PM Dawid Wysakowicz 
wrote:

> Hi Xuefu,
>
> Just wanted to summarize my opinion on the one topic (temporary functions).
>
> My preference would be to make temporary functions always 3-part qualified
> (as a result that would prohibit overriding built-in functions). Having
> said that if the community decides that it's better to allow overriding
> built-in functions I am fine with it and can commit to that decision.
>
> I wanted to ask if you could clarify a few points for me around that
> option.
>
>1. Would you enforce temporary functions to be always just a single
>name (without db & cat) as hive does, or would you allow also 3 or even 2
>part identifiers?
>2. Assuming 2/3-part paths. How would you register a function from a
>following statement: CREATE TEMPORARY FUNCTION db.func? Would that shadow
>all functions named 'func' in all databases named 'db' in all catalogs? Or
>would you shadow only function 'func' in database 'db' in current catalog?
>3. This point is still under discussion, but was mentioned a few
>times, that maybe we want to enable syntax cat.func for "external built-in
>functions". How would that affect statement from previous point? Would
>'db.func' shadow "external built-in function" in 'db' catalog or user
>functions as in point 2? Or maybe both?
>4. Lastly in fact to summarize the previous points. Assuming 2/3-part
>paths. Would the function resolution be actually as follows?:
>   1. temporary functions (1-part path)
>   2. built-in functions
>   3. temporary functions (2-part path)
>   4. 2-part catalog functions a.k.a. "external built-in functions"
>   (cat + func) - this is still under discussion, if we want that in the 
> other
>   focal point
>   5. temporary functions (3-part path)
>   6. 3-part catalog functions a.k.a. user functions
>
> I would be really grateful if you could explain me those questions, thanks.
>
> BTW, Thank you all for a healthy discussion.
>
> Best,
>
> Dawid
> On 04/09/2019 23:25, Xuefu Z wrote:
>
> Thank all for the sharing thoughts. I think we have gathered some useful
> initial feedback from this long discussion with a couple of focal points
> sticking out.
>
>  We will go back to do more research and adapt our proposal. Once it's
> ready, we will ask for a new round of review. If there is any disagreement,
> we will start a new discussion thread on each rather than having a mega
> discussion like this.
>
> Thanks to everyone for participating.
>
> Regards,
> Xuefu
>
>
> On Thu, Sep 5, 2019 at 2:52 AM Bowen Li  
>  wrote:
>
>
> Let me try to summarize and conclude the long thread so far:
>
> 1. For order of temp function v.s. built-in function:
>
> I think Dawid's point that temp function should be of fully qualified path
> is a better reasoning to back the newly proposed order, and i agree we
> don't need to follow Hive/Spark.
>
>

[ANNOUNCE] Flink Forward training registration closes on September 30th

2019-09-05 Thread Fabian Hueske

Hi all,

The registration for the Flink Forward Europe training sessions closes in
four weeks.
The training takes place in Berlin at October 7th and is followed by two
days of talks by speakers from companies like Airbus, Goldman Sachs,
Netflix, Pinterest, and Workday [1].

The following four training sessions are available [2]:
* Developer Training (Java/Scala)
* Operations Training
* SQL Developer Training
* Tuning & Trouble Shooting Training

If you'd like to participate in one of the sessions, you should register
soon at
> *https://europe-2019.flink-forward.org/register
*

*ATTENTION*
Members of the Flink community get a 50% discount on training and
conference tickets if they register with the code *FFEU19-MailingList*
Apache committers (regardless of which project they contribute to) get a
free ticket if they register with their Apache email address and use the
code *FFEU19-ApacheCommitter*

Best,
Fabian

[1] https://europe-2019.flink-forward.org/conference-program
[2] https://europe-2019.flink-forward.org/training-program

Re: [DISCUSS] Adding configure and close methods to the Kafka(De)SerializationSchema interfaces

2019-09-05 Thread Arvid Heise

Hi Gyula,

when looking at the ConfluentRegistryAvroDeserializationSchema [1], it
seems like the intended way is to pass all configuration parameters in the
constructor. So you could call open there.

Could you please line out in more details why this is not enough? What
would you do in open and close respectively?

Best,

Arvid

[1]
https://github.com/apache/flink/blob/master/flink-formats/flink-avro-confluent-registry/src/main/java/org/apache/flink/formats/avro/registry/confluent/ConfluentRegistryAvroDeserializationSchema.java

On Thu, Sep 5, 2019 at 9:43 AM Gyula Fóra  wrote:

> Hi all!
>
> While implementing a new custom flink serialization schema that wraps an
> existing Kafka serializer, I realized we are missing 2 key methods that
> could be easily added:
>
> void configure(java.util.Map configs);
> void close();
>
> We could rename configure to open but Kafka serializers have a configure
> method.
> The configure method would be called when the operator start with the
> provided kafka properties and the close when it shuts down.
>
> Currently there is no way to access the properties from the schema
> interfaces or close the schema on failure.
>
> This would be a very simple addition and could be added as optional methods
> to the interface to not break any schemas that are implemented as lambdas.
>
> What do you think?
>
> Gyula
>


-- 

Arvid Heise | Senior Software Engineer



Follow us @VervericaData

--

Join Flink Forward  - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen

Re: [DISCUSS] Contribute Pulsar Flink connector back to Flink

2019-09-05 Thread Till Rohrmann

Hi everyone,

I'm wondering what the problem would be if we committed the Pulsar
connector before the new source interface is ready. If I understood it
correctly, then we need to support the old source interface anyway for the
existing connectors. By checking it in early I could see the benefit that
our users could start using the connector earlier. Moreover, it would
prevent that the Pulsar integration is being delayed in case that the
source interface should be delayed. The only downside I see is the extra
review effort and potential fixes which might be irrelevant for the new
source interface implementation. I guess it mainly depends on how certain
we are when the new source interface will be ready.

Cheers,
Till

On Thu, Sep 5, 2019 at 8:56 AM Becket Qin  wrote:

> Hi Sijie and Yijie,
>
> Thanks for sharing your thoughts.
>
> Just want to have some update on FLIP-27. Although the FLIP wiki and
> discussion thread has been quiet for some time, a few committer /
> contributors in Flink community were actually prototyping the entire thing.
> We have made some good progress there but want to update the FLIP wiki
> after the entire thing is verified to work in case there are some last
> minute surprise in the implementation. I don't have an exact ETA yet, but I
> guess it is going to be within a month or so.
>
> I am happy to review the current Flink Pulsar connector and see if it would
> fit in FLIP-27. It would be good to avoid the case that we checked in the
> Pulsar connector with some review efforts and shortly after that the new
> Source interface is ready.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Thu, Sep 5, 2019 at 8:39 AM Yijie Shen 
> wrote:
>
> > Thanks for all the feedback and suggestions!
> >
> > As Sijie said, the goal of the connector has always been to provide
> > users with the latest features of both systems as soon as possible. We
> > propose to contribute the connector to Flink and hope to get more
> > suggestions and feedback from Flink experts to ensure the high quality
> > of the connector.
> >
> > For FLIP-27, we noticed its existence at the beginning of reworking
> > the connector implementation based on Flink 1.9; we also wanted to
> > build a connector that supports both batch and stream computing based
> > on it.
> > However, it has been inactive for some time, so we decided to provide
> > a connector with most of the new features, such as the new type system
> > and the new catalog API first. We will pay attention to the progress
> > of FLIP-27 continually and incorporate it with the connector as soon
> > as possible.
> >
> > Regarding the test status of the connector, we are following the other
> > connectors' test in Flink repository and aimed to provide throughout
> > tests as we could. We are also happy to hear suggestions and
> > supervision from the Flink community to improve the stability and
> > performance of the connector continuously.
> >
> > Best,
> > Yijie
> >
> > On Thu, Sep 5, 2019 at 5:59 AM Sijie Guo  wrote:
> > >
> > > Thanks everyone for the comments and feedback.
> > >
> > > It seems to me that the main question here is about - "how can the
> Flink
> > > community maintain the connector?".
> > >
> > > Here are two thoughts from myself.
> > >
> > > 1) I think how and where to host this integration is kind of less
> > important
> > > here. I believe there can be many ways to achieve it.
> > > As part of the contribution, what we are looking for here is how these
> > two
> > > communities can build the collaboration relationship on developing
> > > the integration between Pulsar and Flink. Even we can try our best to
> > catch
> > > up all the updates in Flink community. We are still
> > > facing the fact that we have less experiences in Flink than folks in
> > Flink
> > > community. In order to make sure we maintain and deliver
> > > a high-quality pulsar-flink integration to the users who use both
> > > technologies, we need some help from the experts from Flink community.
> > >
> > > 2) We have been following FLIP-27 for a while. Originally we were
> > thinking
> > > of contributing the connectors back after integrating with the
> > > new API introduced in FLIP-27. But we decided to initiate the
> > conversation
> > > as early as possible. Because we believe there are more benefits doing
> > > it now rather than later. As part of contribution, it can help Flink
> > > community understand more about Pulsar and the potential integration
> > points.
> > > Also we can also help Flink community verify the new connector API as
> > well
> > > as other new API (e.g. catalog API).
> > >
> > > Thanks,
> > > Sijie
> > >
> > > On Wed, Sep 4, 2019 at 5:24 AM Becket Qin 
> wrote:
> > >
> > > > Hi Yijie,
> > > >
> > > > Thanks for the interest in contributing the Pulsar connector.
> > > >
> > > > In general, I think having Pulsar connector with strong support is a
> > > > valuable addition to Flink. So I am happy the shepherd this effort.
> > > > Meanwhile, I would

[jira] [Created] (FLINK-13969) Resuming Externalized Checkpoint (rocks, incremental, scale down) end-to-end test fails on Travis

2019-09-05 Thread Till Rohrmann (Jira)

Till Rohrmann created FLINK-13969:
-

 Summary: Resuming Externalized Checkpoint (rocks, incremental, 
scale down) end-to-end test fails on Travis
 Key: FLINK-13969
 URL: https://issues.apache.org/jira/browse/FLINK-13969
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing
Affects Versions: 1.10.0
Reporter: Till Rohrmann
 Fix For: 1.10.0


The {{Resuming Externalized Checkpoint (rocks, incremental, scale down)}} 
end-to-end test fails on Travis because its log contains an exception

{code}
org.apache.flink.runtime.checkpoint.CheckpointException: Could not complete 
snapshot 16 for operator ArtificalKeyedStateMapper_Avro -> 
ArtificalOperatorStateMapper (2/4). Failure reason: Checkpoint was declined.
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:431)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1302)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1236)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:892)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:797)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:728)
at 
org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:88)
at 
org.apache.flink.streaming.runtime.io.CheckpointBarrierAligner.processBarrier(CheckpointBarrierAligner.java:177)
at 
org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:155)
at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.pollNextNullable(StreamTaskNetworkInput.java:118)
at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.pollNextNullable(StreamTaskNetworkInput.java:48)
at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:144)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.performDefaultAction(StreamTask.java:277)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.execution.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:147)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:404)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Cannot register Closeable, registry is already 
closed. Closing argument.
at 
org.apache.flink.util.AbstractCloseableRegistry.registerCloseable(AbstractCloseableRegistry.java:85)
at 
org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.(AsyncSnapshotCallable.java:122)
at 
org.apache.flink.runtime.state.AsyncSnapshotCallable$AsyncSnapshotTask.(AsyncSnapshotCallable.java:110)
at 
org.apache.flink.runtime.state.AsyncSnapshotCallable.toAsyncSnapshotFutureTask(AsyncSnapshotCallable.java:104)
at 
org.apache.flink.contrib.streaming.state.snapshot.RocksIncrementalSnapshotStrategy.doSnapshot(RocksIncrementalSnapshotStrategy.java:170)
at 
org.apache.flink.contrib.streaming.state.snapshot.RocksDBSnapshotStrategyBase.snapshot(RocksDBSnapshotStrategyBase.java:126)
at 
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.snapshot(RocksDBKeyedStateBackend.java:439)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:411)
... 17 more
{code}

https://api.travis-ci.org/v3/job/580915660/log.txt



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[DISCUSS] Adding configure and close methods to the Kafka(De)SerializationSchema interfaces

2019-09-05 Thread Gyula Fóra

Hi all!

While implementing a new custom flink serialization schema that wraps an
existing Kafka serializer, I realized we are missing 2 key methods that
could be easily added:

void configure(java.util.Map configs);
void close();

We could rename configure to open but Kafka serializers have a configure
method.
The configure method would be called when the operator start with the
provided kafka properties and the close when it shuts down.

Currently there is no way to access the properties from the schema
interfaces or close the schema on failure.

This would be a very simple addition and could be added as optional methods
to the interface to not break any schemas that are implemented as lambdas.

What do you think?

Gyula

Re: [DISCUSS] FLIP-53: Fine Grained Resource Management

2019-09-05 Thread Xintong Song

Thanks all for joining the discussion.
It seems to me that there is a consensus on the current FLIP document. So
if there is no objection, I would like to start the voting process for this
FLIP.

Thank you~

Xintong Song



On Wed, Sep 4, 2019 at 8:23 PM Andrey Zagrebin  wrote:

> Thanks for updating the FLIP Xintong. It looks good to me. I would be ok to
> start a vote for it.
>
> Best,
> Andrey
>
> On Wed, Sep 4, 2019 at 10:03 AM Xintong Song 
> wrote:
>
> > @all
> >
> > The FLIP document [1] has been updated.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
> >
> > On Tue, Sep 3, 2019 at 7:20 PM Zhu Zhu  wrote:
> >
> > > Thanks Xintong for the explanation.
> > >
> > > For question #1, I think it's good as long as DataSet job behaviors
> > remains
> > > the same.
> > >
> > > For question #2, agreed that the resource difference is small enough(at
> > > most 1 edge diff) in current supported point-wise execution edge
> > connection
> > > patterns.
> > >
> > > Thanks,
> > > Zhu Zhu
> > >
> > > Xintong Song  于2019年9月3日周二 下午6:58写道：
> > >
> > > >  Thanks for the comments, Zhu & Kurt.
> > > >
> > > > Andrey and I also had some discussions offline, and I would like to
> > first
> > > > post a summary of our discussion:
> > > >
> > > >1. The motivation of the fraction based approach is to unify
> > resource
> > > >management for both operators with specified and unknown resource
> > > >requirements.
> > > >2. The fraction based approach proposed in this FLIP should only
> > > affect
> > > >streaming jobs (both bounded and unbounded). For DataSet jobs,
> there
> > > are
> > > >already some fraction based approach (in TaskConfig and
> > > ChainedDriver),
> > > > and
> > > >we do not make any change to the existing approach.
> > > >3. The scope of this FLIP does not include discussion of how to
> set
> > > >ResourceSpec for operators.
> > > >   1. For blink jobs, the optimizer can set operator resources for
> > the
> > > >   users, according to their configurations (default: unknown)
> > > >   2. For DataStream jobs, there are no method / interface to set
> > > >   operator resources at the moment (1.10). We can have in the
> > future.
> > > >   3. For DataSet jobs, there are existing user interfaces to set
> > > >   operator resources.
> > > >4. The FLIP should explain more about how ResourceSpecs works
> > > >   1. PhysicalTransformations (deployed with operators into the
> > > >   StreamTasks) get ResourceSpec: unknown by default or known
> (e.g.
> > > > from the
> > > >   Blink planner)
> > > >   2. While generating stream graph, calculate fractions and set
> to
> > > >   StreamConfig
> > > >   3. While scheduling, convert ResourceSpec to ResourceProfile
> > > >   (ResourceSpec + network memory), and deploy to slots / TMs
> > matching
> > > > the
> > > >   resources
> > > >   4. While starting Task in TM, each operator gets fraction
> > converted
> > > >   back to the original absolute value requested by user or fair
> > > > unknown share
> > > >   of the slot
> > > >   5. We should not set `allSourcesInSamePipelinedRegion` to
> `false`
> > > for
> > > >DataSet jobs. Behaviors of DataSet jobs should not be changed.
> > > >6. The FLIP document should differentiate works planed in this
> FLIP
> > > and
> > > >the future follow-ups more clearly, by put the follow-ups in a
> > > separate
> > > >section
> > > >7. Another limitation of the rejected alternative setting
> fractions
> > at
> > > >scheduling time is that, the scheduler implementation does not
> know
> > > > which
> > > >tasks will be deployed into the same slot in advance.
> > > >
> > > > Andrey, Please bring it up if there is anything I missed.
> > > >
> > > > Zhu, regarding your comments:
> > > >
> > > >1. If we do not set `allSourcesInSamePipelinedRegion` to `false`
> for
> > > >DataSet jobs (point 5 in the discussion summary above), then there
> > > >shouldn't be any regression right?
> > > >2. I think it makes sense to set the max possible network memory
> for
> > > the
> > > >JobVertex. When you say parallel instances of the same JobVertex
> may
> > > > have
> > > >need different network memory, I guess you mean the rescale
> > scenarios
> > > > where
> > > >parallelisms of upstream / downstream vertex cannot be exactly
> > divided
> > > > by
> > > >parallelism of downstream / upstream vertex? I would say it's
> > > > acceptable to
> > > >have slight difference between actually needed and allocated
> network
> > > > memory.
> > > >3. Yes, by numOpsUseOnHeapManagedMemory I mean
> > > >numOpsUseOnHeapManagedMemoryInTheSameSharedGroup. I'll update the
> > doc.
> > > >4. Yes, it should be StreamingJobGraphGenerator. Thanks for the
> > > >

[jira] [Created] (FLINK-13968) Add travis check for the correctness of the binary licensing

2019-09-05 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-13968:


 Summary: Add travis check for the correctness of the binary 
licensing
 Key: FLINK-13968
 URL: https://issues.apache.org/jira/browse/FLINK-13968
 Project: Flink
  Issue Type: Improvement
  Components: Travis
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.10.0, 1.9.1


Since the binary licensing must be updated manually whenever the packaging of 
flink-dist or contained jars is modified we should add an automatic check to 
ensure the licensing is updated appropriately.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-13967) Generate full binary licensing via collect_license_files.sh

2019-09-05 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-13967:


 Summary: Generate full binary licensing via 
collect_license_files.sh
 Key: FLINK-13967
 URL: https://issues.apache.org/jira/browse/FLINK-13967
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.10.0, 1.9.1


Currently, {{collect_license_files.sh}} is a generic tool to collect licenses 
and and concatenace NOTICE files from jars contained in a given directory.
In practice though this script is only used to assemble the NOTICE-binary file 
and licenses-binary directory.

To make the script more user-friendly I propose to
# add the `-binary` suffix to the generate file/directory
# automatically add the NOTICE-binary header
# automatically add the LICENSE.slf4j file

With this we can generate the entire binary licensing without and additional 
steps being required, which will also make it easy to add an automatic check 
for the correctness of said licensing.




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-13966) Jar sorting in collect_license_files.sh is locale dependent

2019-09-05 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-13966:


 Summary: Jar sorting in collect_license_files.sh is locale 
dependent
 Key: FLINK-13966
 URL: https://issues.apache.org/jira/browse/FLINK-13966
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 1.8.1
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.10.0, 1.9.1, 1.8.3


The {{collect_license_files.sh}} searches jars for NOTICE files and 
concatenates them to assemble the NOTICE-binary file. To  make the order 
deterministic we order the file paths using {{sort}}, however this util is 
locale dependent.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Re: [DISCUSS] Contribute Pulsar Flink connector back to Flink

2019-09-05 Thread Becket Qin

Hi Sijie and Yijie,

Thanks for sharing your thoughts.

Just want to have some update on FLIP-27. Although the FLIP wiki and
discussion thread has been quiet for some time, a few committer /
contributors in Flink community were actually prototyping the entire thing.
We have made some good progress there but want to update the FLIP wiki
after the entire thing is verified to work in case there are some last
minute surprise in the implementation. I don't have an exact ETA yet, but I
guess it is going to be within a month or so.

I am happy to review the current Flink Pulsar connector and see if it would
fit in FLIP-27. It would be good to avoid the case that we checked in the
Pulsar connector with some review efforts and shortly after that the new
Source interface is ready.

Thanks,

Jiangjie (Becket) Qin

On Thu, Sep 5, 2019 at 8:39 AM Yijie Shen  wrote:

> Thanks for all the feedback and suggestions!
>
> As Sijie said, the goal of the connector has always been to provide
> users with the latest features of both systems as soon as possible. We
> propose to contribute the connector to Flink and hope to get more
> suggestions and feedback from Flink experts to ensure the high quality
> of the connector.
>
> For FLIP-27, we noticed its existence at the beginning of reworking
> the connector implementation based on Flink 1.9; we also wanted to
> build a connector that supports both batch and stream computing based
> on it.
> However, it has been inactive for some time, so we decided to provide
> a connector with most of the new features, such as the new type system
> and the new catalog API first. We will pay attention to the progress
> of FLIP-27 continually and incorporate it with the connector as soon
> as possible.
>
> Regarding the test status of the connector, we are following the other
> connectors' test in Flink repository and aimed to provide throughout
> tests as we could. We are also happy to hear suggestions and
> supervision from the Flink community to improve the stability and
> performance of the connector continuously.
>
> Best,
> Yijie
>
> On Thu, Sep 5, 2019 at 5:59 AM Sijie Guo  wrote:
> >
> > Thanks everyone for the comments and feedback.
> >
> > It seems to me that the main question here is about - "how can the Flink
> > community maintain the connector?".
> >
> > Here are two thoughts from myself.
> >
> > 1) I think how and where to host this integration is kind of less
> important
> > here. I believe there can be many ways to achieve it.
> > As part of the contribution, what we are looking for here is how these
> two
> > communities can build the collaboration relationship on developing
> > the integration between Pulsar and Flink. Even we can try our best to
> catch
> > up all the updates in Flink community. We are still
> > facing the fact that we have less experiences in Flink than folks in
> Flink
> > community. In order to make sure we maintain and deliver
> > a high-quality pulsar-flink integration to the users who use both
> > technologies, we need some help from the experts from Flink community.
> >
> > 2) We have been following FLIP-27 for a while. Originally we were
> thinking
> > of contributing the connectors back after integrating with the
> > new API introduced in FLIP-27. But we decided to initiate the
> conversation
> > as early as possible. Because we believe there are more benefits doing
> > it now rather than later. As part of contribution, it can help Flink
> > community understand more about Pulsar and the potential integration
> points.
> > Also we can also help Flink community verify the new connector API as
> well
> > as other new API (e.g. catalog API).
> >
> > Thanks,
> > Sijie
> >
> > On Wed, Sep 4, 2019 at 5:24 AM Becket Qin  wrote:
> >
> > > Hi Yijie,
> > >
> > > Thanks for the interest in contributing the Pulsar connector.
> > >
> > > In general, I think having Pulsar connector with strong support is a
> > > valuable addition to Flink. So I am happy the shepherd this effort.
> > > Meanwhile, I would also like to provide some context and recent
> efforts on
> > > the Flink connectors ecosystem.
> > >
> > > The current way Flink maintains its connector has hit the scalability
> bar.
> > > With more and more connectors coming into Flink repo, we are facing a
> few
> > > problems such as long build and testing time. To address this problem,
> we
> > > have attempted to do the following:
> > > 1. Split out the connectors into a separate repository. This is
> temporarily
> > > on hold due to potential solution to shorten the build time.
> > > 2. Encourage the connectors to stay as ecosystem project while Flink
> tries
> > > to provide good support for functionality and compatibility tests.
> Robert
> > > has driven to create a Flink Ecosystem project website and it is going
> > > through some final approval process.
> > >
> > > Given the above efforts, it would be great to first see if we can have
> > > Pulsar connector as an ecosystem project with great support.

Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration

2019-09-05 Thread Becket Qin

Hi Dawid,

Thanks a lot for the clarification. Got it now. A few more thoughts:

1. Naming.
I agree that if the name of "Configurable" is a little misleading if we
just want to use it to save POJOs. It would probably help to just name it
something like "ConfigPojo".

2. Flat config map v.s. structured config map.
>From user's perspective, I personally find a flat config map is usually
easier to understand than a structured config format. But it is just my
personal opinion and up for debate.

Taking the Host and CachedFile as examples, personally I think the
following format is more concise and user friendly:

Host: 192.168.0.1:1234 (single config)
Hosts: 192.168.0.1:1234, 192.168.0.2:5678 (list of configs)

CachedFile: path:file:flag (single config)
CachedFile: path1:file1:flag1, path2:file2:flag2 (list config)

Maybe for complicate POJOs the full K-V pair would be necessary, but it
looks we are trying to avoid such complicated POJOs to begin with. Even if
a full K-V is needed, a List> format would also be
almost equivalent to the current design.

3. The necessity of the POJO class in Configuration / ConfigOption system.
I can see the convenience of have a POJO (or ConfigObject) type supported
in the Configuration / ConfigOption. However, one thing to notice is that
API wise, the ConfigurableFactory can return arbitrary type of class
instead of just POJO. This can easily be misused or abused in cases such as
plugins. And the current API design will force such plugins to implement
methods like toConfiguration() which is a little awkward.

Given that 1) there will not be many such Pojo classes and 2) these POJO
classes are defined by Flink, I am thinking that one alternative approach
is to just have the constructors to take the configuration String (or list
of string) and parse that. This will avoid a few complications in this FLIP.
  a) No need to have the ConfigurableFactory class
  b) No need to have the toConfiguration() implementation. So there is just
one way to set values in the Configuration instance.
  c) The Configuration / ConfigOption does not have to also deal with the
Object creation. Instead, they will simply focus on configuration itself.

Thanks for the patient discussion. I don't want to block this FLIP further,
so I am fine to go with the current design with changing the name of
Configurable to something like ConfigPojo in order to avoid misuse as much
as possible.

Thanks,

Jiangjie (Becket) Qin

On Wed, Sep 4, 2019 at 5:50 PM Dawid Wysakowicz 
wrote:

> Hi Becket,
>
> You are right, that what we had in mind for
> ExecutionConfig/CheckpointConfig etc. is the option b) from your email.
> In the context of the FLIP-54, those objects are not Configurable. What
> we understood as a Configurable by the FLIP-54 are a simple pojos, that
> are stored under a single key. Such as the examples either from the ML
> thread (Host) or from the design doc (CacheFile). So when configuring
> the host user can provide a host like this:
>
> connector.host: address:localhost, port:1234
>
> rather than
>
> connector.host.address: localhost
>
> connector.host.port: 1234
>
> This is important especially if one wants to configure lists of such
> objects:
>
> connector.hosts: address:localhost,port:1234;address:localhost,port:4567
>
> The intention was definitely not to store whole complex objects, such as
> ExecutionConfig, CheckpointConfig etc. that contain multiple different
> options Maybe it makes sense to call it ConfigObject as Aljosha
> suggested? What do you think? Would that make it more understandable?
>
> For the initialization/configuration of objects such as ExecutionConfig,
> CheckpointConfig you may have a look at FLIP-59[1] where we suggest to
> add a configure method to those classes and we pretty much describe the
> process you outline in the last message.
>
> Best,
>
> Dawid
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object
>
> On 04/09/2019 03:37, Becket Qin wrote:
> > Hi Timo, Dawid and Aljoscha,
> >
> > Thanks for clarifying the goals. It is very helpful to understand the
> > motivation here. It would be great to add them to the FLIP wiki.
> >
> > I agree that the current FLIP design achieves the two goals it wants to
> > achieve. But I am trying to see is if the current approach is the most
> > reasonable approach.
> >
> > Please let me check if I understand this correctly. From end users'
> > perspective, they will do the following when they want to configure their
> > Flink Jobs.
> > 1. Create a Configuration instance, and call setters of Configuration
> with
> > the ConfigOptions defined in different components.
> > 2. The Configuration created in step 1 will be passed around, and each
> > component will just exact their own options from it.
> > 3. ExecutionConfig, CheckpointConfig (and other Config classes) will
> become
> > a Configurable, which is responsible for extracting the configuration
> > values

Re: [VOTE] FLIP-61 Simplify Flink's cluster level RestartStrategy configuration

2019-09-05 Thread vino yang

+1 (non-binding)

Zili Chen  于2019年9月5日周四 上午10:55写道：

> +1
>
>
> zhijiang  于2019年9月5日周四 上午12:36写道：
>
> > +1
> > --
> > From:Till Rohrmann 
> > Send Time:2019年9月4日(星期三) 13:39
> > To:dev 
> > Cc:Zhu Zhu 
> > Subject:Re: [VOTE] FLIP-61 Simplify Flink's cluster level RestartStrategy
> > configuration
> >
> > +1 (binding)
> >
> > On Wed, Sep 4, 2019 at 12:39 PM Chesnay Schepler 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > On 04/09/2019 11:13, Zhu Zhu wrote:
> > > > +1 (non-binding)
> > > >
> > > > Thanks,
> > > > Zhu Zhu
> > > >
> > > > Till Rohrmann  于2019年9月4日周三 下午5:05写道：
> > > >
> > > >> Hi everyone,
> > > >>
> > > >> I would like to start the voting process for FLIP-61 [1], which is
> > > >> discussed and reached consensus in this thread [2].
> > > >>
> > > >> Since the change is rather small I'd like to shorten the voting
> period
> > > to
> > > >> 48 hours. Hence, I'll try to close it September 6th, 11:00 am CET,
> > > unless
> > > >> there is an objection or not enough votes.
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-61+Simplify+Flink%27s+cluster+level+RestartStrategy+configuration
> > > >> [2]
> > > >>
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/e206390127bcbd9b24d9c41a838faa75157e468e01552ad241e3e24b@%3Cdev.flink.apache.org%3E
> > > >>
> > > >> Cheers,
> > > >> Till
> > > >>
> > >
> > >
> >
> >
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-05 Thread Dawid Wysakowicz

Hi Xuefu,

Just wanted to summarize my opinion on the one topic (temporary functions).

My preference would be to make temporary functions always 3-part
qualified (as a result that would prohibit overriding built-in
functions). Having said that if the community decides that it's better
to allow overriding built-in functions I am fine with it and can commit
to that decision.

I wanted to ask if you could clarify a few points for me around that option.

 1. Would you enforce temporary functions to be always just a single
name (without db & cat) as hive does, or would you allow also 3 or
even 2 part identifiers?
 2. Assuming 2/3-part paths. How would you register a function from a
following statement: CREATE TEMPORARY FUNCTION db.func? Would that
shadow all functions named 'func' in all databases named 'db' in all
catalogs? Or would you shadow only function 'func' in database 'db'
in current catalog?
 3. This point is still under discussion, but was mentioned a few times,
that maybe we want to enable syntax cat.func for "external built-in
functions". How would that affect statement from previous point?
Would 'db.func' shadow "external built-in function" in 'db' catalog
or user functions as in point 2? Or maybe both?
 4. Lastly in fact to summarize the previous points. Assuming 2/3-part
paths. Would the function resolution be actually as follows?:
 1. temporary functions (1-part path)
 2. built-in functions
 3. temporary functions (2-part path)
 4. 2-part catalog functions a.k.a. "external built-in functions"
(cat + func) - this is still under discussion, if we want that
in the other focal point
 5. temporary functions (3-part path)
 6. 3-part catalog functions a.k.a. user functions

I would be really grateful if you could explain me those questions, thanks.

BTW, Thank you all for a healthy discussion.

Best,

Dawid

On 04/09/2019 23:25, Xuefu Z wrote:
> Thank all for the sharing thoughts. I think we have gathered some useful
> initial feedback from this long discussion with a couple of focal points
> sticking out.
>
>  We will go back to do more research and adapt our proposal. Once it's
> ready, we will ask for a new round of review. If there is any disagreement,
> we will start a new discussion thread on each rather than having a mega
> discussion like this.
>
> Thanks to everyone for participating.
>
> Regards,
> Xuefu
>
>
> On Thu, Sep 5, 2019 at 2:52 AM Bowen Li  wrote:
>
>> Let me try to summarize and conclude the long thread so far:
>>
>> 1. For order of temp function v.s. built-in function:
>>
>> I think Dawid's point that temp function should be of fully qualified path
>> is a better reasoning to back the newly proposed order, and i agree we
>> don't need to follow Hive/Spark.
>>
>> However, I'd rather not change fundamentals of temporary functions in this
>> FLIP. It belongs to a bigger story of how temporary objects should be
>> redefined and be handled uniformly - currently temporary tables and views
>> (those registered from TableEnv#registerTable()) behave different than what
>> Dawid propose for temp functions, and we need a FLIP to just unify their
>> APIs and behaviors.
>>
>> I agree that backward compatibility is not an issue w.r.t Jark's points.
>>
>> ***Seems we do have consensus that it's acceptable to prevent users
>> registering a temp function in the same name as a built-in function. To
>> help us move forward, I'd like to propose setting such a restraint on temp
>> functions in this FLIP to simplify the design and avoid disputes.*** It
>> will also leave rooms for improvements in the future.
>>
>>
>> 2. For Hive built-in function:
>>
>> Thanks Timo for providing the Presto and Postgres examples. I feel modular
>> built-in functions can be a good fit for the geo and ml example as a native
>> Flink extension, but not sure if it fits well with external integrations.
>> Anyway, I think modular built-in functions is a bigger story and can be on
>> its own thread too, and our proposal doesn't prevent Flink from doing that
>> in the future.
>>
>> ***Seems we have consensus that users should be able to use built-in
>> functions of Hive or other external systems in SQL explicitly and
>> deterministically regardless of Flink built-in functions and the potential
>> modular built-in functions, via some new syntax like "mycat::func"? If so,
>> I'd like to propose removing Hive built-in functions from ambiguous
>> function resolution order, and empower users with such a syntax. This way
>> we sacrifice a little convenience for certainty***
>>
>>
>> What do you think?
>>
>> On Wed, Sep 4, 2019 at 7:02 AM Dawid Wysakowicz 
>> wrote:
>>
>>> Hi,
>>>
>>> Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
>>> performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they
>> are
>>> very inconsistent in that manner (spark being way worse on that).
>>>
>>> Hive:
>>>
>>> You cannot overwrite all

Re: Re: [DISCUSS] Support JSON functions in Flink SQL

2019-09-05 Thread TANG Wen-hui

+1 
I have done similar work before. 
Looking forward to discussing this feature.

Best
wenhui



winifred.wenhui.t...@gmail.com
 
From: Kurt Young
Date: 2019-09-05 14:00
To: dev
CC: Anyang Hu
Subject: Re: [DISCUSS] Support JSON functions in Flink SQL
+1 to add JSON support to Flink. We also see lots of requirements for JSON
related functions in our internal platform. Since these are already SQL
standard, I think it's a good time to add them to Flink.
 
Best,
Kurt
 
 
On Thu, Sep 5, 2019 at 10:37 AM Qi Luo  wrote:
 
> We also see strong demands from our SQL users for JSON/Date related
> functions.
>
> Also +Anyang Hu 
>
> On Wed, Sep 4, 2019 at 9:51 PM Jark Wu  wrote:
>
> > Hi Forward,
> >
> > Thanks for bringing this discussion and preparing the nice design.
> > I think it's nice to have the JSON functions in the next release.
> > We have received some requirements for this feature.
> >
> > I can help to shepherd this JSON functions effort and will leave comments
> >  in the design doc in the next days.
> >
> > Hi Danny,
> >
> > The new introduced JSON functions are from SQL:2016, not from MySQL.
> > So there no JSON type is needed. According to the SQL:2016, the
> > representation of JSON data can be "character string" which is also
> > the current implementation in Calcite[1].
> >
> > Best,
> > Jark
> >
> >
> > [1]: https://calcite.apache.org/docs/reference.html#json-functions
> >
> >
> > On Wed, 4 Sep 2019 at 21:22, Xu Forward  wrote:
> >
> > > hi Danny Chan ,Thank you very much for your reply, your help can help
> me
> > > further improve this discussion.
> > > Best
> > > forward
> > >
> > > Danny Chan  于2019年9月4日周三 下午8:50写道：
> > >
> > > > Thanks Xu Forward for bring up this topic, I think the JSON functions
> > are
> > > > very useful especially for those MySQL users.
> > > >
> > > > I saw that you have done some work within the Apache Calcite, that’s
> a
> > > > good start, but this is one concern from me, Flink doesn’t support
> JSON
> > > > type internal, so how to represent a JSON object in Flink maybe a key
> > > point
> > > > we need to resolve. In Calcite, we use ANY type to represent as the
> > JSON,
> > > > but I don’t think it is the right way to go, maybe we can have a
> > > discussion
> > > > here.
> > > >
> > > > Best,
> > > > Danny Chan
> > > > 在 2019年9月4日 +0800 PM8:34，Xu Forward ，写道：
> > > > > Hi everybody,
> > > > >
> > > > > I'd like to kick off a discussion on Support JSON functions in
> Flink
> > > SQL.
> > > > >
> > > > > The entire plan is divided into two steps:
> > > > > 1. Implement Support SQL 2016-2017 JSON functions in Flink SQL[1].
> > > > > 2. Implement non-Support SQL 2016-2017 JSON functions in Flink SQL,
> > > such
> > > > as
> > > > > JSON_TYPE in Mysql, JSON_LENGTH, etc. Very useful JSON functions.
> > > > >
> > > > > Would love to hear your thoughts.
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1JfaFYIFOAY8P2pFhOYNCQ9RTzwF4l85_bnTvImOLKMk/edit#heading=h.76mb88ca6yjp
> > > > >
> > > > > Best,
> > > > > ForwardXu
> > > >
> > >
> >
>

Re: [DISCUSS] Support JSON functions in Flink SQL

2019-09-05 Thread Kurt Young

+1 to add JSON support to Flink. We also see lots of requirements for JSON
related functions in our internal platform. Since these are already SQL
standard, I think it's a good time to add them to Flink.

Best,
Kurt


On Thu, Sep 5, 2019 at 10:37 AM Qi Luo  wrote:

> We also see strong demands from our SQL users for JSON/Date related
> functions.
>
> Also +Anyang Hu 
>
> On Wed, Sep 4, 2019 at 9:51 PM Jark Wu  wrote:
>
> > Hi Forward,
> >
> > Thanks for bringing this discussion and preparing the nice design.
> > I think it's nice to have the JSON functions in the next release.
> > We have received some requirements for this feature.
> >
> > I can help to shepherd this JSON functions effort and will leave comments
> >  in the design doc in the next days.
> >
> > Hi Danny,
> >
> > The new introduced JSON functions are from SQL:2016, not from MySQL.
> > So there no JSON type is needed. According to the SQL:2016, the
> > representation of JSON data can be "character string" which is also
> > the current implementation in Calcite[1].
> >
> > Best,
> > Jark
> >
> >
> > [1]: https://calcite.apache.org/docs/reference.html#json-functions
> >
> >
> > On Wed, 4 Sep 2019 at 21:22, Xu Forward  wrote:
> >
> > > hi Danny Chan ,Thank you very much for your reply, your help can help
> me
> > > further improve this discussion.
> > > Best
> > > forward
> > >
> > > Danny Chan  于2019年9月4日周三 下午8:50写道：
> > >
> > > > Thanks Xu Forward for bring up this topic, I think the JSON functions
> > are
> > > > very useful especially for those MySQL users.
> > > >
> > > > I saw that you have done some work within the Apache Calcite, that’s
> a
> > > > good start, but this is one concern from me, Flink doesn’t support
> JSON
> > > > type internal, so how to represent a JSON object in Flink maybe a key
> > > point
> > > > we need to resolve. In Calcite, we use ANY type to represent as the
> > JSON,
> > > > but I don’t think it is the right way to go, maybe we can have a
> > > discussion
> > > > here.
> > > >
> > > > Best,
> > > > Danny Chan
> > > > 在 2019年9月4日 +0800 PM8:34，Xu Forward ，写道：
> > > > > Hi everybody,
> > > > >
> > > > > I'd like to kick off a discussion on Support JSON functions in
> Flink
> > > SQL.
> > > > >
> > > > > The entire plan is divided into two steps:
> > > > > 1. Implement Support SQL 2016-2017 JSON functions in Flink SQL[1].
> > > > > 2. Implement non-Support SQL 2016-2017 JSON functions in Flink SQL,
> > > such
> > > > as
> > > > > JSON_TYPE in Mysql, JSON_LENGTH, etc. Very useful JSON functions.
> > > > >
> > > > > Would love to hear your thoughts.
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1JfaFYIFOAY8P2pFhOYNCQ9RTzwF4l85_bnTvImOLKMk/edit#heading=h.76mb88ca6yjp
> > > > >
> > > > > Best,
> > > > > ForwardXu
> > > >
> > >
> >
>