date:20200116

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Terry Wang

Congratulations! 

Best,
Terry Wang



> 2020年1月17日 14:09，Biao Liu  写道：
> 
> Congrats!
> 
> Thanks,
> Biao /'bɪ.aʊ/
> 
> 
> 
> On Fri, 17 Jan 2020 at 13:43, Rui Li  > wrote:
> Congratulations Dian, well deserved!
> 
> On Thu, Jan 16, 2020 at 5:58 PM jincheng sun  > wrote:
> Hi everyone,
> 
> I'm very happy to announce that Dian accepted the offer of the Flink PMC to 
> become a committer of the Flink project.
> 
> Dian Fu has been contributing to Flink for many years. Dian Fu played an 
> essential role in PyFlink/CEP/SQL/Table API modules. Dian Fu has contributed 
> several major features, reported and fixed many bugs, spent a lot of time 
> reviewing pull requests and also frequently helping out on the user mailing 
> lists and check/vote the release.
>  
> Please join in me congratulating Dian for becoming a Flink committer !
> 
> Best, 
> Jincheng(on behalf of the Flink PMC)
> 
> 
> -- 
> Best regards!
> Rui Li

Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

2020-01-16 Thread Biao Liu

+1

I think that's how it should be. Timer should align with other regular
state.

If user wants a better performance without memory concern, memory or FS
statebackend might be considered. Or maybe we could optimize the
performance by introducing a specific column family for timer. It could
have its own tuned options.

Thanks,
Biao /'bɪ.aʊ/



On Fri, 17 Jan 2020 at 10:11, Jingsong Li  wrote:

> Hi Stephan,
>
> Thanks for starting this discussion.
> +1 for stores times in RocksDB by default.
> In the past, when Flink didn't save the times with RocksDb, I had a
> headache. I always adjusted parameters carefully to ensure that there was
> no risk of Out of Memory.
>
> Just curious, how much impact of heap and RocksDb for times on performance
> - if there is no order of magnitude difference between heap and RocksDb,
> there is no problem in using RocksDb.
> - if there is, maybe we should improve our documentation to let users know
> about this option. (Looks like a lot of users didn't know)
>
> Best,
> Jingsong Lee
>
> On Fri, Jan 17, 2020 at 3:18 AM Yun Tang  wrote:
>
>> Hi Stephan,
>>
>> I am +1 for the change which stores timers in RocksDB by default.
>>
>> Some users hope the checkpoint could be completed as fast as possible,
>> which also need the timer stored in RocksDB to not affect the sync part of
>> checkpoint.
>>
>> Best
>> Yun Tang
>> --
>> *From:* Andrey Zagrebin 
>> *Sent:* Friday, January 17, 2020 0:07
>> *To:* Stephan Ewen 
>> *Cc:* dev ; user 
>> *Subject:* Re: [DISCUSS] Change default for RocksDB timers: Java Heap =>
>> in RocksDB
>>
>> Hi Stephan,
>>
>> Thanks for starting this discussion. I am +1 for this change.
>> In general, number of timer state keys can have the same order as number
>> of main state keys.
>> So if RocksDB is used for main state for scalability, it makes sense to
>> have timers there as well
>> unless timers are used for only very limited subset of keys which fits
>> into memory.
>>
>> Best,
>> Andrey
>>
>> On Thu, Jan 16, 2020 at 4:27 PM Stephan Ewen  wrote:
>>
>> Hi all!
>>
>> I would suggest a change of the current default for timers. A bit of
>> background:
>>
>>   - Timers (for windows, process functions, etc.) are state that is
>> managed and checkpointed as well.
>>   - When using the MemoryStateBackend and the FsStateBackend, timers are
>> kept on the JVM heap, like regular state.
>>   - When using the RocksDBStateBackend, timers can be kept in RocksDB
>> (like other state) or on the JVM heap. The JVM heap is the default though!
>>
>> I find this a bit un-intuitive and would propose to change this to let
>> the RocksDBStateBackend store all state in RocksDB by default.
>> The rationale being that if there is a tradeoff (like here), safe and
>> scalable should be the default and unsafe performance be an explicit choice.
>>
>> This sentiment seems to be shared by various users as well, see
>> https://twitter.com/StephanEwen/status/1214590846168903680 and
>> https://twitter.com/StephanEwen/status/1214594273565388801
>> We would of course keep the switch and mention in the performance tuning
>> section that this is an option.
>>
>> # RocksDB State Backend Timers on Heap
>>   - Pro: faster
>>   - Con: not memory safe, GC overhead, longer synchronous checkpoint
>> time, no incremental checkpoints
>>
>> #  RocksDB State Backend Timers on in RocksDB
>>   - Pro: safe and scalable, asynchronously and incrementally checkpointed
>>   - Con: performance overhead.
>>
>> Please chime in and let me know what you think.
>>
>> Best,
>> Stephan
>>
>>
>
> --
> Best, Jingsong Lee
>

Re: [DISCUSS] Make AppendingState#add refuse to add null element

2020-01-16 Thread Yun Tang

@Aljoscha Krettek , got it and thanks for your 
clarification.

Best
Yun Tang

获取 Outlook for Android


From: Aljoscha Krettek 
Sent: Thursday, January 16, 2020 8:40:57 PM
To: dev@flink.apache.org 
Subject: Re: [DISCUSS] Make AppendingState#add refuse to add null element

This is mostly a remnant from the previous state API, see [1] for
reference. The behaviour was basically copied to the new state
implementations, which was a mistake, in hindsight. Also see [2] where I
added AppendingState, here I also didn't document any special null
behaviour.

Best,
Aljoscha

[1]
https://github.com/apache/flink/commit/caf46728045c0b886e6d4ec0aa429a830740a391
[2]
https://github.com/apache/flink/commit/6cd8ceb10c841827cf89b74ecf5a0495a6933d53

On 16.01.20 04:13, Yun Tang wrote:
> +1 for unifying the behavior of AppendingState#add .
>
> However, I have concern for the usage of window reducing function [1], I'm 
> not sure whether user would rely on processing StreamRecord(null) to clear 
> state. As you can see, user could not see the reducing window state directly, 
> and the only way to communicate with state is via processing records.
>
> I'm not sure whether this is by design, @Aljoscha 
> Krettek  would you please share the initial idea 
> when introducing this for the first time?
>
>
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#reducefunction
>
> Best
> Yun Tang
>
> 
> From: Yu Li 
> Sent: Thursday, January 9, 2020 14:09
> To: dev 
> Subject: Re: [DISCUSS] Make AppendingState#add refuse to add null element
>
> +1 for unifying the behavior to refusing adding null element. Nice catch
> and thanks for bringing up the discussion!
>
> Best Regards,
> Yu
>
>
> On Wed, 8 Jan 2020 at 22:50, Aljoscha Krettek  wrote:
>
>> Hi,
>>
>> As I said in the discussion on the Jira issue, I’m in favour of this
>> change!
>>
>> This is the Jira Issue, for reference:
>> https://issues.apache.org/jira/browse/FLINK-15424
>>
>> Best,
>> Aljoscha
>>
>>> On 8. Jan 2020, at 15:16, Congxian Qiu  wrote:
>>>
>>> Dear All
>>>
>>>
>>> Currently, we found the implementations of AppendingState#add are not the
>>> same, taking some as example:
>>>
>>>- HeapReducingState will clear state if add null element
>>>- RocksDBReducingState will add null element if serializer can
>> serialize
>>>null
>>>- Both HeapListState and RocksDBListState refuse to add null element ―
>>>will throw NullPointException
>>>
>>>
>>> we think this need to be fixed, and possible solutions include:
>>>
>>>1. Respect the current java doc, which said “If null is passed in, the
>>>state value will remain unchanged”
>>>2. Make all AppendingState#add refuse to add null element
>>>
>>>
>>> We propose to apply the second solution, following the recommendation in
>>> Guava[1].
>>>
>>>
>>> Would love to hear your thoughts. Thanks.
>>>
>>>
>>> Regards,
>>>
>>> Congxian
>>>
>>>
>>> [1] https://github.com/google/guava/wiki/UsingAndAvoidingNullExplained
>>
>>
>

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Biao Liu

Congrats!

Thanks,
Biao /'bɪ.aʊ/



On Fri, 17 Jan 2020 at 13:43, Rui Li  wrote:

> Congratulations Dian, well deserved!
>
> On Thu, Jan 16, 2020 at 5:58 PM jincheng sun 
> wrote:
>
>> Hi everyone,
>>
>> I'm very happy to announce that Dian accepted the offer of the Flink PMC
>> to become a committer of the Flink project.
>>
>> Dian Fu has been contributing to Flink for many years. Dian Fu played an
>> essential role in PyFlink/CEP/SQL/Table API modules. Dian Fu has
>> contributed several major features, reported and fixed many bugs, spent a
>> lot of time reviewing pull requests and also frequently helping out on the
>> user mailing lists and check/vote the release.
>>
>> Please join in me congratulating Dian for becoming a Flink committer !
>>
>> Best,
>> Jincheng(on behalf of the Flink PMC)
>>
>
>
> --
> Best regards!
> Rui Li
>

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Rui Li

Congratulations Dian, well deserved!

On Thu, Jan 16, 2020 at 5:58 PM jincheng sun 
wrote:

> Hi everyone,
>
> I'm very happy to announce that Dian accepted the offer of the Flink PMC
> to become a committer of the Flink project.
>
> Dian Fu has been contributing to Flink for many years. Dian Fu played an
> essential role in PyFlink/CEP/SQL/Table API modules. Dian Fu has
> contributed several major features, reported and fixed many bugs, spent a
> lot of time reviewing pull requests and also frequently helping out on the
> user mailing lists and check/vote the release.
>
> Please join in me congratulating Dian for becoming a Flink committer !
>
> Best,
> Jincheng(on behalf of the Flink PMC)
>


-- 
Best regards!
Rui Li

[jira] [Created] (FLINK-15626) Remove legacy scheduler

2020-01-16 Thread Zhu Zhu (Jira)

Zhu Zhu created FLINK-15626:
---

 Summary: Remove legacy scheduler
 Key: FLINK-15626
 URL: https://issues.apache.org/jira/browse/FLINK-15626
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Coordination
Affects Versions: 1.11.0
Reporter: Zhu Zhu
 Fix For: 1.11.0


This umbrella ticket is to track the tickets to remove the legacy scheduler and 
related components.
So that we can have a much cleaner scheduler framework which significantly 
simplifies our next development work on job scheduling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15625) flink sql multiple statements syntatic validation supports

2020-01-16 Thread jackylau (Jira)

jackylau created FLINK-15625:


 Summary: flink sql multiple statements syntatic validation supports
 Key: FLINK-15625
 URL: https://issues.apache.org/jira/browse/FLINK-15625
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Legacy Planner
Reporter: jackylau
 Fix For: 1.10.0


we konw that flink supports multiple statement syntatic validation by calcite, 
which validates sql statements one by one, and it will not validate the 
previous tablenames and others. and we only know the sql syntatic error when we 
submit the flink applications. 

I think it is eagerly need for users. we hope the flink community to support it 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Yuan Mei

Congrats!

Best

Yuan

On Thu, Jan 16, 2020 at 5:59 PM jincheng sun 
wrote:

> Hi everyone,
>
> I'm very happy to announce that Dian accepted the offer of the Flink PMC to
> become a committer of the Flink project.
>
> Dian Fu has been contributing to Flink for many years. Dian Fu played an
> essential role in PyFlink/CEP/SQL/Table API modules. Dian Fu has
> contributed several major features, reported and fixed many bugs, spent a
> lot of time reviewing pull requests and also frequently helping out on the
> user mailing lists and check/vote the release.
>
> Please join in me congratulating Dian for becoming a Flink committer !
>
> Best,
> Jincheng(on behalf of the Flink PMC)
>

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Paul Lam

Congrats, Dian!

Best,
Paul Lam

> 在 2020年1月17日，10:49，tison  写道：
> 
> Congratulations! Dian
> 
> Best,
> tison.
> 
> 
> Zhu Zhu mailto:reed...@gmail.com>> 于2020年1月17日周五 
> 上午10:47写道：
> Congratulations Dian.
> 
> Thanks,
> Zhu Zhu
> 
> hailongwang <18868816...@163.com > 于2020年1月17日周五 
> 上午10:01写道：
> 
> Congratulations Dian !
> 
> Best,
> Hailong Wang
> 
> 
> 
> 
> 在 2020-01-16 21:15:34，"Congxian Qiu"  > 写道：
> 
> Congratulations Dian Fu
> 
> Best,
> Congxian
> 
> 
> Jark Wu mailto:imj...@gmail.com>> 于2020年1月16日周四 下午7:44写道：
> Congratulations Dian and welcome on board!
> 
> Best,
> Jark
> 
> On Thu, 16 Jan 2020 at 19:32, Jingsong Li  > wrote:
> 
> > Congratulations Dian Fu. Well deserved!
> >
> > Best,
> > Jingsong Lee
> >
> > On Thu, Jan 16, 2020 at 6:26 PM jincheng sun  > >
> > wrote:
> >
> >> Congrats Dian Fu and welcome on board!
> >>
> >> Best,
> >> Jincheng
> >>
> >> Shuo Cheng mailto:njucs...@gmail.com>> 于2020年1月16日周四 
> >> 下午6:22写道：
> >>
> >>> Congratulations!  Dian Fu
> >>>
> >>> > Xingbo Wei Zhong  于2020年1月16日周四 下午6:13写道：  jincheng sun
> >>> 于2020年1月16日周四 下午5:58写道：
> >>>
> >>
> >
> > --
> > Best, Jingsong Lee
> >
> 
> 
> 
>  
>

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread tison

Congratulations! Dian

Best,
tison.


Zhu Zhu  于2020年1月17日周五 上午10:47写道：

> Congratulations Dian.
>
> Thanks,
> Zhu Zhu
>
> hailongwang <18868816...@163.com> 于2020年1月17日周五 上午10:01写道：
>
>>
>> Congratulations Dian !
>>
>> Best,
>> Hailong Wang
>>
>>
>>
>>
>> 在 2020-01-16 21:15:34，"Congxian Qiu"  写道：
>>
>> Congratulations Dian Fu
>>
>> Best,
>> Congxian
>>
>>
>> Jark Wu  于2020年1月16日周四 下午7:44写道：
>>
>>> Congratulations Dian and welcome on board!
>>>
>>> Best,
>>> Jark
>>>
>>> On Thu, 16 Jan 2020 at 19:32, Jingsong Li 
>>> wrote:
>>>
>>> > Congratulations Dian Fu. Well deserved!
>>> >
>>> > Best,
>>> > Jingsong Lee
>>> >
>>> > On Thu, Jan 16, 2020 at 6:26 PM jincheng sun >> >
>>> > wrote:
>>> >
>>> >> Congrats Dian Fu and welcome on board!
>>> >>
>>> >> Best,
>>> >> Jincheng
>>> >>
>>> >> Shuo Cheng  于2020年1月16日周四 下午6:22写道：
>>> >>
>>> >>> Congratulations!  Dian Fu
>>> >>>
>>> >>> > Xingbo Wei Zhong  于2020年1月16日周四 下午6:13写道：  jincheng sun
>>> >>> 于2020年1月16日周四 下午5:58写道：
>>> >>>
>>> >>
>>> >
>>> > --
>>> > Best, Jingsong Lee
>>> >
>>>
>>
>>
>>
>>
>>
>

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Zhu Zhu

Congratulations Dian.

Thanks,
Zhu Zhu

hailongwang <18868816...@163.com> 于2020年1月17日周五 上午10:01写道：

>
> Congratulations Dian !
>
> Best,
> Hailong Wang
>
>
>
>
> 在 2020-01-16 21:15:34，"Congxian Qiu"  写道：
>
> Congratulations Dian Fu
>
> Best,
> Congxian
>
>
> Jark Wu  于2020年1月16日周四 下午7:44写道：
>
>> Congratulations Dian and welcome on board!
>>
>> Best,
>> Jark
>>
>> On Thu, 16 Jan 2020 at 19:32, Jingsong Li  wrote:
>>
>> > Congratulations Dian Fu. Well deserved!
>> >
>> > Best,
>> > Jingsong Lee
>> >
>> > On Thu, Jan 16, 2020 at 6:26 PM jincheng sun 
>> > wrote:
>> >
>> >> Congrats Dian Fu and welcome on board!
>> >>
>> >> Best,
>> >> Jincheng
>> >>
>> >> Shuo Cheng  于2020年1月16日周四 下午6:22写道：
>> >>
>> >>> Congratulations!  Dian Fu
>> >>>
>> >>> > Xingbo Wei Zhong  于2020年1月16日周四 下午6:13写道：  jincheng sun
>> >>> 于2020年1月16日周四 下午5:58写道：
>> >>>
>> >>
>> >
>> > --
>> > Best, Jingsong Lee
>> >
>>
>
>
>
>
>

Re: [DISCUSS] FLIP-72: Introduce Pulsar Connector

2020-01-16 Thread Yijie Shen

Hi everyone,

I've updated the Catalog PR and make all settings small case. And tests are
added as well.
Hi Bowen, could you please take a look.
https://github.com/apache/flink/pull/10455

For the sink part of the connector, I've made a separate PR
https://github.com/apache/flink/pull/10875. Could someone help review this?


Best,
Yijie


On Thu, Jan 9, 2020 at 8:44 AM Bowen Li  wrote:

> Hi Yijie,
>
> There's just one more concern on the yaml configs. Otherwise, I think we
> should be good to go.
>
> Can you update your PR and ensure all tests pass? I can help review and
> merge in the next couple weeks.
>
> Thanks,
> Bowen
>
>
> On Mon, Dec 23, 2019 at 7:03 PM Yijie Shen 
> wrote:
>
> > Hi Bowen,
> >
> > I've done updated the design doc, PTAL.
> > Btw the PR for catalog is https://github.com/apache/flink/pull/10455,
> > could
> > you please take a look?
> >
> > Best,
> > Yijie
> >
> > On Mon, Dec 9, 2019 at 8:44 AM Bowen Li  wrote:
> >
> > > Hi Yijie,
> > >
> > > I took a look at the design doc. LGTM overall, left a few questions.
> > >
> > > On Tue, Dec 3, 2019 at 10:39 PM Becket Qin 
> wrote:
> > >
> > > > Yes, you are absolutely right. Cannot believe I posted in the wrong
> > > > thread...
> > > >
> > > > On Wed, Dec 4, 2019 at 1:46 PM Jark Wu  wrote:
> > > >
> > > >> Thanks Becket the the updating,
> > > >>
> > > >> But shouldn't this message be posted in FLIP-27 discussion
> thread[1]?
> > > >>
> > > >>
> > > >> Best,
> > > >> Jark
> > > >>
> > > >> [1]:
> > > >>
> > > >>
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-27-Refactor-Source-Interface-td24952.html
> > > >>
> > > >> On Wed, 4 Dec 2019 at 12:12, Becket Qin 
> wrote:
> > > >>
> > > >> > Hi all,
> > > >> >
> > > >> > Sorry for the long belated update. I have updated FLIP-27 wiki
> page
> > > with
> > > >> > the latest proposals. Some noticeable changes include:
> > > >> > 1. A new generic communication mechanism between SplitEnumerator
> and
> > > >> > SourceReader.
> > > >> > 2. Some detail API method signature changes.
> > > >> >
> > > >> > We left a few things out of this FLIP and will address them in
> > > separate
> > > >> > FLIPs. Including:
> > > >> > 1. Per split event time.
> > > >> > 2. Event time alignment.
> > > >> > 3. Fine grained failover for SplitEnumerator failure.
> > > >> >
> > > >> > Please let us know if you have any question.
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> > Jiangjie (Becket) Qin
> > > >> >
> > > >> > On Tue, Nov 19, 2019 at 10:28 AM Yijie Shen <
> > > henry.yijies...@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> > > Hi everyone,
> > > >> > >
> > > >> > > I've put the catalog part design in separate doc with more
> details
> > > for
> > > >> > > easier communication.
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://docs.google.com/document/d/1LMnABtXn-wQedsmWv8hopvx-B-jbdr8-jHbIiDhdsoE/edit?usp=sharing
> > > >> > >
> > > >> > > I would love to hear your thoughts on this.
> > > >> > >
> > > >> > > Best,
> > > >> > > Yijie
> > > >> > >
> > > >> > > On Mon, Oct 21, 2019 at 11:15 AM Yijie Shen <
> > > >> henry.yijies...@gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Hi everyone,
> > > >> > > >
> > > >> > > > Glad to receive your valuable feedbacks.
> > > >> > > >
> > > >> > > > I'd first separate the Pulsar catalog as another doc and show
> > more
> > > >> > design
> > > >> > > > and implementation details there.
> > > >> > > >
> > > >> > > > For the current FLIP-72, I would separate it into the sink
> part
> > > for
> > > >> > > > current work and keep the source part as future works until we
> > > reach
> > > >> > > > FLIP-27 finals.
> > > >> > > >
> > > >> > > > I also reply to some of the comments in the design doc. I will
> > > >> rewrite
> > > >> > > the
> > > >> > > > catalog part in regarding to Bowen's advice in both email and
> > > >> comments.
> > > >> > > >
> > > >> > > > Thanks for the help again.
> > > >> > > >
> > > >> > > > Best,
> > > >> > > > Yijie
> > > >> > > >
> > > >> > > > On Fri, Oct 18, 2019 at 12:40 AM Rong Rong <
> walter...@gmail.com
> > >
> > > >> > wrote:
> > > >> > > >
> > > >> > > >> Hi Yijie,
> > > >> > > >>
> > > >> > > >> I also agree with Jark on separating the Catalog part into
> > > another
> > > >> > FLIP.
> > > >> > > >>
> > > >> > > >> With FLIP-27[1] also in the air, it is also probably great to
> > > split
> > > >> > and
> > > >> > > >> unblock the sink implementation contribution.
> > > >> > > >> I would suggest either putting in a detail implementation
> plan
> > > >> section
> > > >> > > in
> > > >> > > >> the doc, or (maybe too much separation?) splitting them into
> > > >> different
> > > >> > > >> FLIPs. What do you guys think?
> > > >> > > >>
> > > >> > > >> --
> > > >> > > >> Rong
> > > >> > > >>
> > > >> > > >> [1]
> > > >> > > >>
> > > >> > > >>
> > > >> > >
> > > >> >
> > > >>
> > >
> >
>

Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

2020-01-16 Thread Jingsong Li

Hi Stephan,

Thanks for starting this discussion.
+1 for stores times in RocksDB by default.
In the past, when Flink didn't save the times with RocksDb, I had a
headache. I always adjusted parameters carefully to ensure that there was
no risk of Out of Memory.

Just curious, how much impact of heap and RocksDb for times on performance
- if there is no order of magnitude difference between heap and RocksDb,
there is no problem in using RocksDb.
- if there is, maybe we should improve our documentation to let users know
about this option. (Looks like a lot of users didn't know)

Best,
Jingsong Lee

On Fri, Jan 17, 2020 at 3:18 AM Yun Tang  wrote:

> Hi Stephan,
>
> I am +1 for the change which stores timers in RocksDB by default.
>
> Some users hope the checkpoint could be completed as fast as possible,
> which also need the timer stored in RocksDB to not affect the sync part of
> checkpoint.
>
> Best
> Yun Tang
> --
> *From:* Andrey Zagrebin 
> *Sent:* Friday, January 17, 2020 0:07
> *To:* Stephan Ewen 
> *Cc:* dev ; user 
> *Subject:* Re: [DISCUSS] Change default for RocksDB timers: Java Heap =>
> in RocksDB
>
> Hi Stephan,
>
> Thanks for starting this discussion. I am +1 for this change.
> In general, number of timer state keys can have the same order as number
> of main state keys.
> So if RocksDB is used for main state for scalability, it makes sense to
> have timers there as well
> unless timers are used for only very limited subset of keys which fits
> into memory.
>
> Best,
> Andrey
>
> On Thu, Jan 16, 2020 at 4:27 PM Stephan Ewen  wrote:
>
> Hi all!
>
> I would suggest a change of the current default for timers. A bit of
> background:
>
>   - Timers (for windows, process functions, etc.) are state that is
> managed and checkpointed as well.
>   - When using the MemoryStateBackend and the FsStateBackend, timers are
> kept on the JVM heap, like regular state.
>   - When using the RocksDBStateBackend, timers can be kept in RocksDB
> (like other state) or on the JVM heap. The JVM heap is the default though!
>
> I find this a bit un-intuitive and would propose to change this to let the
> RocksDBStateBackend store all state in RocksDB by default.
> The rationale being that if there is a tradeoff (like here), safe and
> scalable should be the default and unsafe performance be an explicit choice.
>
> This sentiment seems to be shared by various users as well, see
> https://twitter.com/StephanEwen/status/1214590846168903680 and
> https://twitter.com/StephanEwen/status/1214594273565388801
> We would of course keep the switch and mention in the performance tuning
> section that this is an option.
>
> # RocksDB State Backend Timers on Heap
>   - Pro: faster
>   - Con: not memory safe, GC overhead, longer synchronous checkpoint time,
> no incremental checkpoints
>
> #  RocksDB State Backend Timers on in RocksDB
>   - Pro: safe and scalable, asynchronously and incrementally checkpointed
>   - Con: performance overhead.
>
> Please chime in and let me know what you think.
>
> Best,
> Stephan
>
>

-- 
Best, Jingsong Lee

[jira] [Created] (FLINK-15624) Pulsar Sink

2020-01-16 Thread Yijie Shen (Jira)

Yijie Shen created FLINK-15624:
--

 Summary: Pulsar Sink
 Key: FLINK-15624
 URL: https://issues.apache.org/jira/browse/FLINK-15624
 Project: Flink
  Issue Type: Sub-task
Reporter: Yijie Shen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread hailongwang



Congratulations Dian !


Best,
Hailong Wang









在 2020-01-16 21:15:34，"Congxian Qiu"  写道：

Congratulations Dian Fu


Best,
Congxian




Jark Wu  于2020年1月16日周四 下午7:44写道：

Congratulations Dian and welcome on board!

Best,
Jark

On Thu, 16 Jan 2020 at 19:32, Jingsong Li  wrote:

> Congratulations Dian Fu. Well deserved!
>
> Best,
> Jingsong Lee
>
> On Thu, Jan 16, 2020 at 6:26 PM jincheng sun 
> wrote:
>
>> Congrats Dian Fu and welcome on board!
>>
>> Best,
>> Jincheng
>>
>> Shuo Cheng  于2020年1月16日周四 下午6:22写道：
>>
>>> Congratulations!  Dian Fu
>>>
>>> > Xingbo Wei Zhong  于2020年1月16日周四 下午6:13写道：  jincheng sun
>>> 于2020年1月16日周四 下午5:58写道：
>>>
>>
>
> --
> Best, Jingsong Lee
>

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread aihua li

Congratulations!  Dian Fu

> 2020年1月16日 下午6:22，Shuo Cheng  写道：
> 
> Congratulations!  Dian Fu

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Bowen Li

Congrats!

On Thu, Jan 16, 2020 at 13:45 Peter Huang 
wrote:

> Congratulations, Dian!
>
>
> Best Regards
> Peter Huang
>
> On Thu, Jan 16, 2020 at 11:04 AM Yun Tang  wrote:
>
>> Congratulations, Dian!
>>
>> Best
>> Yun Tang
>> --
>> *From:* Benchao Li 
>> *Sent:* Thursday, January 16, 2020 22:27
>> *To:* Congxian Qiu 
>> *Cc:* dev@flink.apache.org ; Jingsong Li <
>> jingsongl...@gmail.com>; jincheng sun ; Shuo
>> Cheng ; Xingbo Huang ; Wei Zhong
>> ; Hequn Cheng ; Leonard Xu
>> ; Jeff Zhang ; user <
>> u...@flink.apache.org>; user-zh 
>> *Subject:* Re: [ANNOUNCE] Dian Fu becomes a Flink committer
>>
>> Congratulations Dian.
>>
>> Congxian Qiu  于2020年1月16日周四 下午10:15写道：
>>
>> > Congratulations Dian Fu
>> >
>> > Best,
>> > Congxian
>> >
>> >
>> > Jark Wu  于2020年1月16日周四 下午7:44写道：
>> >
>> >> Congratulations Dian and welcome on board!
>> >>
>> >> Best,
>> >> Jark
>> >>
>> >> On Thu, 16 Jan 2020 at 19:32, Jingsong Li 
>> wrote:
>> >>
>> >> > Congratulations Dian Fu. Well deserved!
>> >> >
>> >> > Best,
>> >> > Jingsong Lee
>> >> >
>> >> > On Thu, Jan 16, 2020 at 6:26 PM jincheng sun <
>> sunjincheng...@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Congrats Dian Fu and welcome on board!
>> >> >>
>> >> >> Best,
>> >> >> Jincheng
>> >> >>
>> >> >> Shuo Cheng  于2020年1月16日周四 下午6:22写道：
>> >> >>
>> >> >>> Congratulations!  Dian Fu
>> >> >>>
>> >> >>> > Xingbo Wei Zhong  于2020年1月16日周四 下午6:13写道：  jincheng sun
>> >> >>> 于2020年1月16日周四 下午5:58写道：
>> >> >>>
>> >> >>
>> >> >
>> >> > --
>> >> > Best, Jingsong Lee
>> >> >
>> >>
>> >
>>
>> --
>>
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>>
>

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Peter Huang

Congratulations, Dian!


Best Regards
Peter Huang

On Thu, Jan 16, 2020 at 11:04 AM Yun Tang  wrote:

> Congratulations, Dian!
>
> Best
> Yun Tang
> --
> *From:* Benchao Li 
> *Sent:* Thursday, January 16, 2020 22:27
> *To:* Congxian Qiu 
> *Cc:* dev@flink.apache.org ; Jingsong Li <
> jingsongl...@gmail.com>; jincheng sun ; Shuo
> Cheng ; Xingbo Huang ; Wei Zhong <
> weizhong0...@gmail.com>; Hequn Cheng ; Leonard Xu <
> xbjt...@gmail.com>; Jeff Zhang ; user <
> u...@flink.apache.org>; user-zh 
> *Subject:* Re: [ANNOUNCE] Dian Fu becomes a Flink committer
>
> Congratulations Dian.
>
> Congxian Qiu  于2020年1月16日周四 下午10:15写道：
>
> > Congratulations Dian Fu
> >
> > Best,
> > Congxian
> >
> >
> > Jark Wu  于2020年1月16日周四 下午7:44写道：
> >
> >> Congratulations Dian and welcome on board!
> >>
> >> Best,
> >> Jark
> >>
> >> On Thu, 16 Jan 2020 at 19:32, Jingsong Li 
> wrote:
> >>
> >> > Congratulations Dian Fu. Well deserved!
> >> >
> >> > Best,
> >> > Jingsong Lee
> >> >
> >> > On Thu, Jan 16, 2020 at 6:26 PM jincheng sun <
> sunjincheng...@gmail.com>
> >> > wrote:
> >> >
> >> >> Congrats Dian Fu and welcome on board!
> >> >>
> >> >> Best,
> >> >> Jincheng
> >> >>
> >> >> Shuo Cheng  于2020年1月16日周四 下午6:22写道：
> >> >>
> >> >>> Congratulations!  Dian Fu
> >> >>>
> >> >>> > Xingbo Wei Zhong  于2020年1月16日周四 下午6:13写道：  jincheng sun
> >> >>> 于2020年1月16日周四 下午5:58写道：
> >> >>>
> >> >>
> >> >
> >> > --
> >> > Best, Jingsong Lee
> >> >
> >>
> >
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>

Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

2020-01-16 Thread Yun Tang

Hi Stephan,

I am +1 for the change which stores timers in RocksDB by default.

Some users hope the checkpoint could be completed as fast as possible, which 
also need the timer stored in RocksDB to not affect the sync part of checkpoint.

Best
Yun Tang

From: Andrey Zagrebin 
Sent: Friday, January 17, 2020 0:07
To: Stephan Ewen 
Cc: dev ; user 
Subject: Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in 
RocksDB

Hi Stephan,

Thanks for starting this discussion. I am +1 for this change.
In general, number of timer state keys can have the same order as number of 
main state keys.
So if RocksDB is used for main state for scalability, it makes sense to have 
timers there as well
unless timers are used for only very limited subset of keys which fits into 
memory.

Best,
Andrey

On Thu, Jan 16, 2020 at 4:27 PM Stephan Ewen 
mailto:se...@apache.org>> wrote:
Hi all!

I would suggest a change of the current default for timers. A bit of background:

  - Timers (for windows, process functions, etc.) are state that is managed and 
checkpointed as well.
  - When using the MemoryStateBackend and the FsStateBackend, timers are kept 
on the JVM heap, like regular state.
  - When using the RocksDBStateBackend, timers can be kept in RocksDB (like 
other state) or on the JVM heap. The JVM heap is the default though!

I find this a bit un-intuitive and would propose to change this to let the 
RocksDBStateBackend store all state in RocksDB by default.
The rationale being that if there is a tradeoff (like here), safe and scalable 
should be the default and unsafe performance be an explicit choice.

This sentiment seems to be shared by various users as well, see 
https://twitter.com/StephanEwen/status/1214590846168903680 and 
https://twitter.com/StephanEwen/status/1214594273565388801
We would of course keep the switch and mention in the performance tuning 
section that this is an option.

# RocksDB State Backend Timers on Heap
  - Pro: faster
  - Con: not memory safe, GC overhead, longer synchronous checkpoint time, no 
incremental checkpoints

#  RocksDB State Backend Timers on in RocksDB
  - Pro: safe and scalable, asynchronously and incrementally checkpointed
  - Con: performance overhead.

Please chime in and let me know what you think.

Best,
Stephan

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Yun Tang

Congratulations, Dian!

Best
Yun Tang

From: Benchao Li 
Sent: Thursday, January 16, 2020 22:27
To: Congxian Qiu 
Cc: dev@flink.apache.org ; Jingsong Li 
; jincheng sun ; Shuo Cheng 
; Xingbo Huang ; Wei Zhong 
; Hequn Cheng ; Leonard Xu 
; Jeff Zhang ; user 
; user-zh 
Subject: Re: [ANNOUNCE] Dian Fu becomes a Flink committer

Congratulations Dian.

Congxian Qiu  于2020年1月16日周四 下午10:15写道：

> Congratulations Dian Fu
>
> Best,
> Congxian
>
>
> Jark Wu  于2020年1月16日周四 下午7:44写道：
>
>> Congratulations Dian and welcome on board!
>>
>> Best,
>> Jark
>>
>> On Thu, 16 Jan 2020 at 19:32, Jingsong Li  wrote:
>>
>> > Congratulations Dian Fu. Well deserved!
>> >
>> > Best,
>> > Jingsong Lee
>> >
>> > On Thu, Jan 16, 2020 at 6:26 PM jincheng sun 
>> > wrote:
>> >
>> >> Congrats Dian Fu and welcome on board!
>> >>
>> >> Best,
>> >> Jincheng
>> >>
>> >> Shuo Cheng  于2020年1月16日周四 下午6:22写道：
>> >>
>> >>> Congratulations!  Dian Fu
>> >>>
>> >>> > Xingbo Wei Zhong  于2020年1月16日周四 下午6:13写道：  jincheng sun
>> >>> 于2020年1月16日周四 下午5:58写道：
>> >>>
>> >>
>> >
>> > --
>> > Best, Jingsong Lee
>> >
>>
>

--

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenc...@gmail.com; libenc...@pku.edu.cn

[jira] [Created] (FLINK-15623) Buildling flink-python with maven profile docs-and-source fails

2020-01-16 Thread Gary Yao (Jira)

Gary Yao created FLINK-15623:


 Summary: Buildling flink-python with maven profile docs-and-source 
fails
 Key: FLINK-15623
 URL: https://issues.apache.org/jira/browse/FLINK-15623
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 1.10.0
 Environment: rev: 91d96abe5f42bd088a326870b4885d79611fccb5
Reporter: Gary Yao
 Fix For: 1.10.0


*Description*
Building flink-python with maven profile docs-and-source fails due to 
checkstyle violations. 

*How to reproduce*

Running

{noformat}
mvn clean install -pl flink-python -Pdocs-and-source -DskipTests 
-DretryFailedDeploymentCount=10
{noformat}

should fail with the following error

{noformat}
[...]
[ERROR] 
generated-sources/org/apache/flink/fnexecution/v1/FlinkFnApi.java:[8343] 
(regexp) RegexpSinglelineJava: Line has leading space characters; indentation 
should be performed with tabs only.
[ERROR] 
generated-sources/org/apache/flink/fnexecution/v1/FlinkFnApi.java:[8344] 
(regexp) RegexpSinglelineJava: Line has leading space characters; indentation 
should be performed with tabs only.
[ERROR] 
generated-sources/org/apache/flink/fnexecution/v1/FlinkFnApi.java:[8345] 
(regexp) RegexpSinglelineJava: Line has leading space characters; indentation 
should be performed with tabs only.
[ERROR] 
generated-sources/org/apache/flink/fnexecution/v1/FlinkFnApi.java:[8346] 
(regexp) RegexpSinglelineJava: Line has leading space characters; indentation 
should be performed with tabs only.
[ERROR] 
generated-sources/org/apache/flink/fnexecution/v1/FlinkFnApi.java:[8347] 
(regexp) RegexpSinglelineJava: Line has leading space characters; indentation 
should be performed with tabs only.
[ERROR] 
generated-sources/org/apache/flink/fnexecution/v1/FlinkFnApi.java:[8348] 
(regexp) RegexpSinglelineJava: Line has leading space characters; indentation 
should be performed with tabs only.
[ERROR] 
generated-sources/org/apache/flink/fnexecution/v1/FlinkFnApi.java:[8349] 
(regexp) RegexpSinglelineJava: Line has leading space characters; indentation 
should be performed with tabs only.
[ERROR] 
generated-sources/org/apache/flink/fnexecution/v1/FlinkFnApi.java:[8350] 
(regexp) RegexpSinglelineJava: Line has leading space characters; indentation 
should be performed with tabs only.
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 18.046 s
[INFO] Finished at: 2020-01-16T16:44:01+00:00
[INFO] Final Memory: 158M/2826M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check (validate) on 
project flink-python_2.11: You have 7603 Checkstyle violations. -> [Help 1]
[ERROR]
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15622) Clarify plugin mechanism in documentation

2020-01-16 Thread Seth Wiesman (Jira)

Seth Wiesman created FLINK-15622:


 Summary: Clarify plugin mechanism in documentation
 Key: FLINK-15622
 URL: https://issues.apache.org/jira/browse/FLINK-15622
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, FileSystems
Reporter: Seth Wiesman


The flink documentation has passing references to the new plugin architecture 
released in Flink 1.9 but nothing concrete. It is not immediately clear:

1) What a plugin is
2) How it affects the bundling of dependencies



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

2020-01-16 Thread Andrey Zagrebin

Hi Stephan,

Thanks for starting this discussion. I am +1 for this change.
In general, number of timer state keys can have the same order as number of
main state keys.
So if RocksDB is used for main state for scalability, it makes sense to
have timers there as well
unless timers are used for only very limited subset of keys which fits into
memory.

Best,
Andrey

On Thu, Jan 16, 2020 at 4:27 PM Stephan Ewen  wrote:

> Hi all!
>
> I would suggest a change of the current default for timers. A bit of
> background:
>
>   - Timers (for windows, process functions, etc.) are state that is
> managed and checkpointed as well.
>   - When using the MemoryStateBackend and the FsStateBackend, timers are
> kept on the JVM heap, like regular state.
>   - When using the RocksDBStateBackend, timers can be kept in RocksDB
> (like other state) or on the JVM heap. The JVM heap is the default though!
>
> I find this a bit un-intuitive and would propose to change this to let the
> RocksDBStateBackend store all state in RocksDB by default.
> The rationale being that if there is a tradeoff (like here), safe and
> scalable should be the default and unsafe performance be an explicit choice.
>
> This sentiment seems to be shared by various users as well, see
> https://twitter.com/StephanEwen/status/1214590846168903680 and
> https://twitter.com/StephanEwen/status/1214594273565388801
> We would of course keep the switch and mention in the performance tuning
> section that this is an option.
>
> # RocksDB State Backend Timers on Heap
>   - Pro: faster
>   - Con: not memory safe, GC overhead, longer synchronous checkpoint time,
> no incremental checkpoints
>
> #  RocksDB State Backend Timers on in RocksDB
>   - Pro: safe and scalable, asynchronously and incrementally checkpointed
>   - Con: performance overhead.
>
> Please chime in and let me know what you think.
>
> Best,
> Stephan
>
>

[jira] [Created] (FLINK-15621) State TTL: Remove deprecated option and method to disable TTL compaction filter

2020-01-16 Thread Andrey Zagrebin (Jira)

Andrey Zagrebin created FLINK-15621:
---

 Summary: State TTL: Remove deprecated option and method to disable 
TTL compaction filter
 Key: FLINK-15621
 URL: https://issues.apache.org/jira/browse/FLINK-15621
 Project: Flink
  Issue Type: Task
Reporter: Andrey Zagrebin
 Fix For: 1.11.0


Follow-up for FLINK-15506.
 * Remove RocksDBOptions#TTL_COMPACT_FILTER_ENABLED
 * Remove 
RocksDBStateBackend#enableTtlCompactionFilter/isTtlCompactionFilterEnabled/disableTtlCompactionFilter,
 also in python API
 * Cleanup code from this flag and tests, also in python API
 * Check any related code in 
[frocksdb|[https://github.com/dataArtisans/frocksdb]] if any



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15620) State TTL: Remove deprecated enable default background cleanup

2020-01-16 Thread Andrey Zagrebin (Jira)

Andrey Zagrebin created FLINK-15620:
---

 Summary: State TTL: Remove deprecated enable default background 
cleanup
 Key: FLINK-15620
 URL: https://issues.apache.org/jira/browse/FLINK-15620
 Project: Flink
  Issue Type: Task
  Components: Runtime / State Backends
Reporter: Andrey Zagrebin
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Add N-Ary Stream Operator

2020-01-16 Thread Piotr Nowojski

Hi,

Good question. I think not, at least not in the first version, unless someone 
can convince us that this is better to do immediately. 

Piotrek

> On 14 Jan 2020, at 04:49, Yun Tang  wrote:
> 
> Hi 
> 
> I noticed that previous design doc [1] also talked about the topic of 
> introducing new KeyedStreamOperatorNG, I wonder is that a must-do to 
> introduce N-ary stream operator?
> 
> 
> [1] 
> https://docs.google.com/document/d/1ZFzL_0xGuUEnBsFyEiHwWcmCcjhd9ArWsmhrgnt05RI
>  
> 
> 
> Best
> Yun Tang
> From: Piotr Nowojski mailto:pi...@ververica.com>>
> Sent: Thursday, January 9, 2020 23:27
> To: dev mailto:dev@flink.apache.org>>
> Subject: Re: [DISCUSS] Add N-Ary Stream Operator
>  
> Hi,
> 
> I have started a vote on this topic [1], please cast your +1 or -1 there :)
> 
> Also I assigned FLIP-92 number to this design doc.
> 
> Piotrek
> 
> [1] 
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-92-Add-N-Ary-Stream-Operator-in-Flink-td36539.html
>  
> 
>  
>   
> >
> 
> > On 10 Dec 2019, at 07:10, Jingsong Li  > > wrote:
> > 
> > Hi Piotr,
> > 
> > Sorry for the misunderstanding, chaining does work with multiple output
> > right now, I mean, it's also a very important feature, and it should work
> > with N-ary selectable input operators.
> > We all think that providing N-ary selectable input operator is a very
> > important thing, it makes TwoInputOperator chaining possible in upper
> > layer, and it makes things simpler.
> > 
> > Looking forward to it very much.
> > 
> > Best,
> > Jingsong Lee
> > 
> > On Thu, Dec 5, 2019 at 6:01 PM Piotr Nowojski  > > wrote:
> > 
> >> Hi,
> >> 
> >> Thanks for the clarifications Jingsong. Indeed, if chaining doesn’t work
> >> with multiple output right now (doesn’t it?), that’s also a good future
> >> story.
> >> 
> >> Re Kurt:
> >> I think this pattern could be easily handled if those two joins are
> >> implemented as a single 3 input operator, that internally is composed of
> >> those three operators.
> >> 1. You can set the initial InputSelection to Build1 and Build2.
> >> 2. When Build1 receives `endOfInput`, InputSelection switches to Probe1
> >> and Build2.
> >> 3. When Probe1 receives `endOfInput`, you do not forward the `endOfInput`
> >> to the internal `HashAgg` operator
> >> 4. When Build2 finally receives `endOfInput`, you can finally forward the
> >> `endOfInput` to the internal `HashAgg`
> >> 
> >> Exactly for reasons like that, I wanted to at least post pone handling
> >> tree-like operator chains in the Flink. Logic like that is difficult to
> >> express generically, since it requires the knowledge about the operators
> >> behaviour. While when hardcoded for the specific project (Blink in this
> >> case) and encapsulated behind N-ary selectable input operator, it’s very
> >> easy to handle by the runtime. Sure, at the expense of a bit more
> >> complexity in forcing the user to compose operators, that’s why I’m not
> >> saying that we do not want to handle this at some point in the future, but
> >> at least not in the first version.
> >> 
> >> Piotrek
> >> 
> >>> On 5 Dec 2019, at 10:11, Jingsong Li  >>> > wrote:
> >>> 
> >>> Kurt mentioned a very interesting thing,
> >>> 
> >>> If we want to better performance to read simultaneously, To this pattern:
> >>> We need to control not only the read order of inputs, but also the
> >> outputs
> >>> of endInput.
> >>> In this case, HashAggregate can only call its real endInput after the
> >> input
> >>> of build2 is finished, so the endInput of an operator is not necessarily
> >>> determined by its input, but also by other associated inputs.
> >>> I think we have the ability to do this in the n-input operator.
> >>> 
> >>> Note that these behaviors should be determined at compile time.
> >>> 
> >>> Best,
> >>> Jingsong Lee
> >>> 
> >>> On Thu, Dec 5, 2019 at 4:42 PM Kurt Young  >>> > wrote:
> >>> 
>  During implementing n-ary input operator in table, please keep
>  this pattern in mind:
>  
>  Build1 ---+
>  
>  |
>  
>  +---> HshJoin1 --—> HashAgg ---+
>  
>  |  |
>  
>  Probe1 ---+  +---> HashJoin2
>  
> |
>  
>   Build2 ---+
>  
>  It's quite interesting that both `Build1`, `Build2` and `Probe1` can
>

[DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

2020-01-16 Thread Stephan Ewen

Hi all!

I would suggest a change of the current default for timers. A bit of
background:

  - Timers (for windows, process functions, etc.) are state that is managed
and checkpointed as well.
  - When using the MemoryStateBackend and the FsStateBackend, timers are
kept on the JVM heap, like regular state.
  - When using the RocksDBStateBackend, timers can be kept in RocksDB (like
other state) or on the JVM heap. The JVM heap is the default though!

I find this a bit un-intuitive and would propose to change this to let the
RocksDBStateBackend store all state in RocksDB by default.
The rationale being that if there is a tradeoff (like here), safe and
scalable should be the default and unsafe performance be an explicit choice.

This sentiment seems to be shared by various users as well, see
https://twitter.com/StephanEwen/status/1214590846168903680 and
https://twitter.com/StephanEwen/status/1214594273565388801
We would of course keep the switch and mention in the performance tuning
section that this is an option.

# RocksDB State Backend Timers on Heap
  - Pro: faster
  - Con: not memory safe, GC overhead, longer synchronous checkpoint time,
no incremental checkpoints

#  RocksDB State Backend Timers on in RocksDB
  - Pro: safe and scalable, asynchronously and incrementally checkpointed
  - Con: performance overhead.

Please chime in and let me know what you think.

Best,
Stephan

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Benchao Li

Congratulations Dian.

Congxian Qiu  于2020年1月16日周四 下午10:15写道：

> Congratulations Dian Fu
>
> Best,
> Congxian
>
>
> Jark Wu  于2020年1月16日周四 下午7:44写道：
>
>> Congratulations Dian and welcome on board!
>>
>> Best,
>> Jark
>>
>> On Thu, 16 Jan 2020 at 19:32, Jingsong Li  wrote:
>>
>> > Congratulations Dian Fu. Well deserved!
>> >
>> > Best,
>> > Jingsong Lee
>> >
>> > On Thu, Jan 16, 2020 at 6:26 PM jincheng sun 
>> > wrote:
>> >
>> >> Congrats Dian Fu and welcome on board!
>> >>
>> >> Best,
>> >> Jincheng
>> >>
>> >> Shuo Cheng  于2020年1月16日周四 下午6:22写道：
>> >>
>> >>> Congratulations!  Dian Fu
>> >>>
>> >>> > Xingbo Wei Zhong  于2020年1月16日周四 下午6:13写道：  jincheng sun
>> >>> 于2020年1月16日周四 下午5:58写道：
>> >>>
>> >>
>> >
>> > --
>> > Best, Jingsong Lee
>> >
>>
>

-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenc...@gmail.com; libenc...@pku.edu.cn

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Congxian Qiu

Congratulations Dian Fu

Best,
Congxian


Jark Wu  于2020年1月16日周四 下午7:44写道：

> Congratulations Dian and welcome on board!
>
> Best,
> Jark
>
> On Thu, 16 Jan 2020 at 19:32, Jingsong Li  wrote:
>
> > Congratulations Dian Fu. Well deserved!
> >
> > Best,
> > Jingsong Lee
> >
> > On Thu, Jan 16, 2020 at 6:26 PM jincheng sun 
> > wrote:
> >
> >> Congrats Dian Fu and welcome on board!
> >>
> >> Best,
> >> Jincheng
> >>
> >> Shuo Cheng  于2020年1月16日周四 下午6:22写道：
> >>
> >>> Congratulations!  Dian Fu
> >>>
> >>> > Xingbo Wei Zhong  于2020年1月16日周四 下午6:13写道：  jincheng sun
> >>> 于2020年1月16日周四 下午5:58写道：
> >>>
> >>
> >
> > --
> > Best, Jingsong Lee
> >
>

Re: Frequently checkpoint failure, could make the flink sql state not clear?

2020-01-16 Thread Congxian Qiu

Hi,

AFAIK, whether a timer will fire is irrelevant to checkpoint success or not.

Best,
Congxian


LakeShen  于2020年1月16日周四 下午8:53写道：

> Hi community， now I am using Flink sql , and I set the retention time, As
> I all know is that Flink will set the timer for per key to clear their
> state, if Flink task always checkpoint failure, are the  key state cleared
> by timer?
> Thanks to your replay.
>

[jira] [Created] (FLINK-15619) GroupWindowTableAggregateITCase.testAllProcessingTimeTumblingGroupWindowOverCount failed on Azure

2020-01-16 Thread Congxian Qiu(klion26) (Jira)

Congxian Qiu(klion26) created FLINK-15619:
-

 Summary: 
GroupWindowTableAggregateITCase.testAllProcessingTimeTumblingGroupWindowOverCount
 failed  on Azure
 Key: FLINK-15619
 URL: https://issues.apache.org/jira/browse/FLINK-15619
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.10.0
Reporter: Congxian Qiu(klion26)


01-16T08:32:11.0214825Z [ERROR] 
testAllProcessingTimeTumblingGroupWindowOverCount[StateBackend=HEAP](org.apache.flink.table.planner.runtime.stream.table.GroupWindowTableAggregateITCase)
 Time elapsed: 2.213 s <<< ERROR! 2020-01-16T08:32:11.0223298Z 
org.apache.flink.runtime.client.JobExecutionException: Job execution failed. 
2020-01-16T08:32:11.0241857Z at 
org.apache.flink.table.planner.runtime.stream.table.GroupWindowTableAggregateITCase.testAllProcessingTimeTumblingGroupWindowOverCount(GroupWindowTableAggregateITCase.scala:130)
 2020-01-16T08:32:11.0261909Z Caused by: org.apache.flink.runtime.JobException: 
Recovery is suppressed by 
FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=1, 
backoffTimeMS=0) 2020-01-16T08:32:11.0274347Z Caused by: java.lang.Exception: 
Artificial Failure 2020-01-16T08:32:11.0291664Z

 

[https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_apis/build/builds/4391/logs/16]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15618) Remove useless JobTimeoutException

2020-01-16 Thread vinoyang (Jira)

vinoyang created FLINK-15618:


 Summary: Remove useless JobTimeoutException
 Key: FLINK-15618
 URL: https://issues.apache.org/jira/browse/FLINK-15618
 Project: Flink
  Issue Type: Wish
Reporter: vinoyang


Currently, the exception class {{JobTimeoutException}} has not been used 
anywhere. IMO, we can remove it from Flink codebase.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15617) Remove useless JobRetrievalException

2020-01-16 Thread vinoyang (Jira)

vinoyang created FLINK-15617:


 Summary: Remove useless JobRetrievalException
 Key: FLINK-15617
 URL: https://issues.apache.org/jira/browse/FLINK-15617
 Project: Flink
  Issue Type: Wish
Reporter: vinoyang


Currently, the exception class {{JobRetrievalException}} has not been used 
anywhere in Flink codebase. IMO, we can remove it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Frequently checkpoint failure, could make the flink sql state not clear?

2020-01-16 Thread LakeShen

Hi community， now I am using Flink sql , and I set the retention time, As I
all know is that Flink will set the timer for per key to clear their state,
if Flink task always checkpoint failure, are the  key state cleared by
timer?
Thanks to your replay.

Re: [DISCUSS] Make AppendingState#add refuse to add null element

2020-01-16 Thread Aljoscha Krettek

This is mostly a remnant from the previous state API, see [1] for 
reference. The behaviour was basically copied to the new state 
implementations, which was a mistake, in hindsight. Also see [2] where I 
added AppendingState, here I also didn't document any special null 
behaviour.


Best,
Aljoscha

[1] 
https://github.com/apache/flink/commit/caf46728045c0b886e6d4ec0aa429a830740a391
[2] 
https://github.com/apache/flink/commit/6cd8ceb10c841827cf89b74ecf5a0495a6933d53


On 16.01.20 04:13, Yun Tang wrote:

+1 for unifying the behavior of AppendingState#add .

However, I have concern for the usage of window reducing function [1], I'm not 
sure whether user would rely on processing StreamRecord(null) to clear state. 
As you can see, user could not see the reducing window state directly, and the 
only way to communicate with state is via processing records.

I'm not sure whether this is by design, @Aljoscha 
Krettek  would you please share the initial idea 
when introducing this for the first time?


[1] 
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#reducefunction

Best
Yun Tang


From: Yu Li 
Sent: Thursday, January 9, 2020 14:09
To: dev 
Subject: Re: [DISCUSS] Make AppendingState#add refuse to add null element

+1 for unifying the behavior to refusing adding null element. Nice catch
and thanks for bringing up the discussion!

Best Regards,
Yu


On Wed, 8 Jan 2020 at 22:50, Aljoscha Krettek  wrote:


Hi,

As I said in the discussion on the Jira issue, I’m in favour of this
change!

This is the Jira Issue, for reference:
https://issues.apache.org/jira/browse/FLINK-15424

Best,
Aljoscha


On 8. Jan 2020, at 15:16, Congxian Qiu  wrote:

Dear All


Currently, we found the implementations of AppendingState#add are not the
same, taking some as example:

   - HeapReducingState will clear state if add null element
   - RocksDBReducingState will add null element if serializer can

serialize

   null
   - Both HeapListState and RocksDBListState refuse to add null element —
   will throw NullPointException


we think this need to be fixed, and possible solutions include:

   1. Respect the current java doc, which said “If null is passed in, the
   state value will remain unchanged”
   2. Make all AppendingState#add refuse to add null element


We propose to apply the second solution, following the recommendation in
Guava[1].


Would love to hear your thoughts. Thanks.


Regards,

Congxian


[1] https://github.com/google/guava/wiki/UsingAndAvoidingNullExplained

Re: [DISCUSS] Improve TableFactory

2020-01-16 Thread Jingsong Li

Thanks Bowen and Timo for involving.

Hi Bowen,

> 1. is it better to have explicit APIs like "createBatchTableSource(...)"
I think it is better to keep one method, since in [1], we have reached one
in DataStream layer to maintain a single API in "env.source". I think it is
good to not split batch and stream, And our TableSource/TableSink are the
same class for both batch and streaming too.

> 2. I'm not sure of the benefits to have a CatalogTableContext class.
As Timo said, We may have more parameters to add in the future, take a look
to "AbstractRichFunction.RuntimeContext", It's added little by little.

Hi Timo,

Your suggestion about Context looks good to me.
"TablePath" used in Hive for updating the catalog information of this
table. Yes, "ObjectIdentifier" looks better than "ObjectPath".

> Can we postpone the change of TableValidators?
Yes, ConfigOption validation looks good to me. It seems that you have been
thinking about this for a long time. It's very good. Looking forward to the
promotion of FLIP-54.

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-27-Refactor-Source-Interface-td24952i80.html#a36692

Best,
Jingsong Lee

On Thu, Jan 16, 2020 at 6:01 PM Timo Walther  wrote:

> Hi Jingsong,
>
> +1 for adding a context in the source and sink factories. A context
> class also allows for future modifications without touching the
> TableFactory interface again.
>
> How about:
>
> interface TableSourceFactory {
>  interface Context {
> // ...
>  }
> }
>
> Because I find the name `CatalogTableContext` confusing and we can bound
> the interface to the factory class itself as an inner interface.
>
> Readable access to configuration sounds also right to me. Can we remove
> the `ObjectPath getTablePath()` method? I don't see a reason why a
> factory should know the path. And if so, it should be an
> `ObjectIdentifier` instead to also know about the catalog we are using.
>
> The `isStreamingMode()` should be renamed to `isBounded()` because we
> would like to use terminology around boundedness rather than
> streaming/batch.
>
> @Bowen: We are in the process of unifying the APIs and thus explicitly
> avoid specialized methods in the future.
>
> Can we postpone the change of TableValidators? I don't think that every
> factory needs a schema validator. Ideally, the factory should just
> return a List or ConfigOptionGroup that contains the
> validation logic as mentioned in the validation part of FLIP-54[1]. But
> currently our config options are not rich enough to have a unified
> validation. Additionally, the factory should return some properties such
> as "supports event-time" for the schema validation outside of the
> factory itself.
>
> Regards,
> Timo
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration
>
>
>
> On 16.01.20 00:51, Bowen Li wrote:
> > Hi Jingsong,
> >
> > The 1st and 2nd pain points you described are very valid, as I'm more
> > familiar with them. I agree these are shortcomings of the current Flink
> SQL
> > design.
> >
> > A couple comments on your 1st proposal:
> >
> > 1. is it better to have explicit APIs like "createBatchTableSource(...)"
> > and "createStreamingTableSource(...)" in TableSourceFactory (would be
> > similar for sink factory) to let planner handle which mode (streaming vs
> > batch) of source should be instantiated? That way we don't need to always
> > let connector developers handling an if-else on isStreamingMode.
> > 2. I'm not sure of the benefits to have a CatalogTableContext class. The
> > path, table, and config are fairly independent of each other. So why not
> > pass the config in as 3rd parameter as `createXxxTableSource(path,
> > catalogTable, tableConfig)?
> >
> >
> > On Tue, Jan 14, 2020 at 7:03 PM Jingsong Li 
> wrote:
> >
> >> Hi dev,
> >>
> >> I'd like to kick off a discussion on the improvement of
> TableSourceFactory
> >> and TableSinkFactory.
> >>
> >> Motivation:
> >> Now the main needs and problems are:
> >> 1.Connector can't get TableConfig [1], and some behaviors really need
> to be
> >> controlled by the user's table configuration. In the era of catalog, we
> >> can't put these config in connector properties, which is too
> inconvenient.
> >> 2.Connector can't know if this is batch or stream execution mode. But
> the
> >> sink implementation of batch and stream is totally different. I
> understand
> >> there is an update mode property now, but it splits the batch and
> stream in
> >> the catalog dimension. In fact, this information can be obtained through
> >> the current TableEnvironment.
> >> 3.No interface to call validation. Now our validation is more util
> classes.
> >> It depends on whether or not the connector calls. Now we have some new
> >> validations to add, such as [2], which is really confuse uses, even
> >> developers. Another problem is that our SQL update (DDL) does not have
> >> validation [3]. It is better to report an error

[jira] [Created] (FLINK-15616) Move boot error messages from python-udf-boot.log to taskmanager's log file

2020-01-16 Thread Hequn Cheng (Jira)

Hequn Cheng created FLINK-15616:
---

 Summary: Move boot error messages from python-udf-boot.log to 
taskmanager's log file
 Key: FLINK-15616
 URL: https://issues.apache.org/jira/browse/FLINK-15616
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Hequn Cheng
Assignee: Hequn Cheng


Previously, the boot error messages are printed in the log file under 
FLINK_LOG_DIR, i.e., 
{{"$FLINK_LOG_DIR/flink-$USER-python-udf-boot-$HOSTNAME.log"}}. This additional 
file is very hard to locate for users, so it would be better to print the error 
messages directly into the taskmanager log file. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Jark Wu

Congratulations Dian and welcome on board!

Best,
Jark

On Thu, 16 Jan 2020 at 19:32, Jingsong Li  wrote:

> Congratulations Dian Fu. Well deserved!
>
> Best,
> Jingsong Lee
>
> On Thu, Jan 16, 2020 at 6:26 PM jincheng sun 
> wrote:
>
>> Congrats Dian Fu and welcome on board!
>>
>> Best,
>> Jincheng
>>
>> Shuo Cheng  于2020年1月16日周四 下午6:22写道：
>>
>>> Congratulations!  Dian Fu
>>>
>>> > Xingbo Wei Zhong  于2020年1月16日周四 下午6:13写道：  jincheng sun
>>> 于2020年1月16日周四 下午5:58写道：
>>>
>>
>
> --
> Best, Jingsong Lee
>

Re: [DISCUSS] Improve TableFactory

2020-01-16 Thread Jark Wu

Hi Jingsong,

I'm also +1 for adding a context in the source and sink factories.

Base on the Timo's proposal, IMO,

1) table path might be useful because it is a part of DDL and users might
use it as the source/sink display name in explainSource().
2) isStreamingMode vs isBounded: however, a bounded source can also be
executed in streaming mode. I think `isStreamingMode`
is fine because it's intuitive and EnvironmentSettings also uses this
terminology. My concern is should we consider the unified mode
maybe introduced in the future?

Best,
Jark



On Thu, 16 Jan 2020 at 18:01, Timo Walther  wrote:

> Hi Jingsong,
>
> +1 for adding a context in the source and sink factories. A context
> class also allows for future modifications without touching the
> TableFactory interface again.
>
> How about:
>
> interface TableSourceFactory {
>  interface Context {
> // ...
>  }
> }
>
> Because I find the name `CatalogTableContext` confusing and we can bound
> the interface to the factory class itself as an inner interface.
>
> Readable access to configuration sounds also right to me. Can we remove
> the `ObjectPath getTablePath()` method? I don't see a reason why a
> factory should know the path. And if so, it should be an
> `ObjectIdentifier` instead to also know about the catalog we are using.
>
> The `isStreamingMode()` should be renamed to `isBounded()` because we
> would like to use terminology around boundedness rather than
> streaming/batch.
>
> @Bowen: We are in the process of unifying the APIs and thus explicitly
> avoid specialized methods in the future.
>
> Can we postpone the change of TableValidators? I don't think that every
> factory needs a schema validator. Ideally, the factory should just
> return a List or ConfigOptionGroup that contains the
> validation logic as mentioned in the validation part of FLIP-54[1]. But
> currently our config options are not rich enough to have a unified
> validation. Additionally, the factory should return some properties such
> as "supports event-time" for the schema validation outside of the
> factory itself.
>
> Regards,
> Timo
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration
>
>
>
> On 16.01.20 00:51, Bowen Li wrote:
> > Hi Jingsong,
> >
> > The 1st and 2nd pain points you described are very valid, as I'm more
> > familiar with them. I agree these are shortcomings of the current Flink
> SQL
> > design.
> >
> > A couple comments on your 1st proposal:
> >
> > 1. is it better to have explicit APIs like "createBatchTableSource(...)"
> > and "createStreamingTableSource(...)" in TableSourceFactory (would be
> > similar for sink factory) to let planner handle which mode (streaming vs
> > batch) of source should be instantiated? That way we don't need to always
> > let connector developers handling an if-else on isStreamingMode.
> > 2. I'm not sure of the benefits to have a CatalogTableContext class. The
> > path, table, and config are fairly independent of each other. So why not
> > pass the config in as 3rd parameter as `createXxxTableSource(path,
> > catalogTable, tableConfig)?
> >
> >
> > On Tue, Jan 14, 2020 at 7:03 PM Jingsong Li 
> wrote:
> >
> >> Hi dev,
> >>
> >> I'd like to kick off a discussion on the improvement of
> TableSourceFactory
> >> and TableSinkFactory.
> >>
> >> Motivation:
> >> Now the main needs and problems are:
> >> 1.Connector can't get TableConfig [1], and some behaviors really need
> to be
> >> controlled by the user's table configuration. In the era of catalog, we
> >> can't put these config in connector properties, which is too
> inconvenient.
> >> 2.Connector can't know if this is batch or stream execution mode. But
> the
> >> sink implementation of batch and stream is totally different. I
> understand
> >> there is an update mode property now, but it splits the batch and
> stream in
> >> the catalog dimension. In fact, this information can be obtained through
> >> the current TableEnvironment.
> >> 3.No interface to call validation. Now our validation is more util
> classes.
> >> It depends on whether or not the connector calls. Now we have some new
> >> validations to add, such as [2], which is really confuse uses, even
> >> developers. Another problem is that our SQL update (DDL) does not have
> >> validation [3]. It is better to report an error when executing DDL,
> >> otherwise it will confuse the user.
> >>
> >> Proposed change draft for 1 and 2:
> >>
> >> interface CatalogTableContext {
> >> ObjectPath getTablePath();
> >> CatalogTable getTable();
> >> ReadableConfig getTableConfig();
> >> boolean isStreamingMode();
> >> }
> >>
> >> public interface TableSourceFactory extends TableFactory {
> >>
> >> default TableSource createTableSource(CatalogTableContext
> context) {
> >>return createTableSource(context.getTablePath(),
> context.getTable());
> >> }
> >>
> >> ..
> >> }
> >>
> >> Proposed change draft for 3:
> >>
> >>

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Jingsong Li

Congratulations Dian Fu. Well deserved!

Best,
Jingsong Lee

On Thu, Jan 16, 2020 at 6:26 PM jincheng sun 
wrote:

> Congrats Dian Fu and welcome on board!
>
> Best,
> Jincheng
>
> Shuo Cheng  于2020年1月16日周四 下午6:22写道：
>
>> Congratulations!  Dian Fu
>>
>> > Xingbo Wei Zhong  于2020年1月16日周四 下午6:13写道：  jincheng sun
>> 于2020年1月16日周四 下午5:58写道：
>>
>

-- 
Best, Jingsong Lee

[jira] [Created] (FLINK-15615) Docs: wrong guarantees stated for the file sink

2020-01-16 Thread Roman Khachatryan (Jira)

Roman Khachatryan created FLINK-15615:
-

 Summary: Docs: wrong guarantees stated for the file sink
 Key: FLINK-15615
 URL: https://issues.apache.org/jira/browse/FLINK-15615
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.9.1, 1.10.0
Reporter: Roman Khachatryan


[https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/connectors/guarantees.html]

Instead of "at least once" it should be "exactly once".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread jincheng sun

Congrats Dian Fu and welcome on board!

Best,
Jincheng

Shuo Cheng  于2020年1月16日周四 下午6:22写道：

> Congratulations!  Dian Fu
>
> > Xingbo Wei Zhong  于2020年1月16日周四 下午6:13写道：  jincheng sun
> 于2020年1月16日周四 下午5:58写道：
>

[ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Shuo Cheng

Congratulations!  Dian Fu

> Xingbo Wei Zhong  于2020年1月16日周四 下午6:13写道：  jincheng sun  
> 于2020年1月16日周四 下午5:58写道：

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Xingbo Huang

Congratulations, Dian.
Well deserved!

Best,
Xingbo


Wei Zhong  于2020年1月16日周四 下午6:13写道：

> Congrats Dian Fu! Well deserved!
>
> Best,
> Wei
>
> 在 2020年1月16日，18:10，Hequn Cheng  写道：
>
> Congratulations, Dian.
> Well deserved!
>
> Best, Hequn
>
> On Thu, Jan 16, 2020 at 6:08 PM Leonard Xu  wrote:
>
>> Congratulations!  Dian Fu
>>
>> Best,
>> Leonard
>>
>> 在 2020年1月16日，18:00，Jeff Zhang  写道：
>>
>> Congrats Dian Fu ！
>>
>> jincheng sun  于2020年1月16日周四 下午5:58写道：
>>
>>> Hi everyone,
>>>
>>> I'm very happy to announce that Dian accepted the offer of the Flink PMC
>>> to become a committer of the Flink project.
>>>
>>> Dian Fu has been contributing to Flink for many years. Dian Fu played an
>>> essential role in PyFlink/CEP/SQL/Table API modules. Dian Fu has
>>> contributed several major features, reported and fixed many bugs, spent a
>>> lot of time reviewing pull requests and also frequently helping out on the
>>> user mailing lists and check/vote the release.
>>>
>>> Please join in me congratulating Dian for becoming a Flink committer !
>>>
>>> Best,
>>> Jincheng(on behalf of the Flink PMC)
>>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>>
>>
>

[jira] [Created] (FLINK-15614) Consolidate documentation on how to integrate Hadoop

2020-01-16 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-15614:


 Summary: Consolidate documentation on how to integrate Hadoop
 Key: FLINK-15614
 URL: https://issues.apache.org/jira/browse/FLINK-15614
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, Project Website
Affects Versions: 1.9.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler


Instructions on how to use Flink with Hadoop are spread out over multiple pages:
* https://flink.apache.org/downloads.html
* 
https://ci.apache.org/projects/flink/flink-docs-release-1.9/flinkDev/building.html#hadoop-versions
* 
https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/deployment/hadoop.html

We should consolidate the information on a single page that we link to from 
elsewhere.

Additionally we should clarify under what circumstances the flink-shaded build 
instructions are useful, i.e., when exporting the hadoop classpath caused 
issues, a custom hadoop version is required etc. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Wei Zhong

Congrats Dian Fu! Well deserved!

Best,
Wei

> 在 2020年1月16日，18:10，Hequn Cheng  写道：
> 
> Congratulations, Dian.
> Well deserved!
> 
> Best, Hequn 
> 
> On Thu, Jan 16, 2020 at 6:08 PM Leonard Xu  > wrote:
> Congratulations!  Dian Fu
> 
> Best,
> Leonard
> 
>> 在 2020年1月16日，18:00，Jeff Zhang mailto:zjf...@gmail.com>> 
>> 写道：
>> 
>> Congrats Dian Fu ！
>> 
>> jincheng sun mailto:sunjincheng...@gmail.com>> 
>> 于2020年1月16日周四 下午5:58写道：
>> Hi everyone,
>> 
>> I'm very happy to announce that Dian accepted the offer of the Flink PMC to 
>> become a committer of the Flink project.
>> 
>> Dian Fu has been contributing to Flink for many years. Dian Fu played an 
>> essential role in PyFlink/CEP/SQL/Table API modules. Dian Fu has contributed 
>> several major features, reported and fixed many bugs, spent a lot of time 
>> reviewing pull requests and also frequently helping out on the user mailing 
>> lists and check/vote the release.
>>  
>> Please join in me congratulating Dian for becoming a Flink committer !
>> 
>> Best, 
>> Jincheng(on behalf of the Flink PMC)
>> 
>> 
>> -- 
>> Best Regards
>> 
>> Jeff Zhang
>

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Hequn Cheng

Congratulations, Dian.
Well deserved!

Best, Hequn

On Thu, Jan 16, 2020 at 6:08 PM Leonard Xu  wrote:

> Congratulations!  Dian Fu
>
> Best,
> Leonard
>
> 在 2020年1月16日，18:00，Jeff Zhang  写道：
>
> Congrats Dian Fu ！
>
> jincheng sun  于2020年1月16日周四 下午5:58写道：
>
>> Hi everyone,
>>
>> I'm very happy to announce that Dian accepted the offer of the Flink PMC
>> to become a committer of the Flink project.
>>
>> Dian Fu has been contributing to Flink for many years. Dian Fu played an
>> essential role in PyFlink/CEP/SQL/Table API modules. Dian Fu has
>> contributed several major features, reported and fixed many bugs, spent a
>> lot of time reviewing pull requests and also frequently helping out on the
>> user mailing lists and check/vote the release.
>>
>> Please join in me congratulating Dian for becoming a Flink committer !
>>
>> Best,
>> Jincheng(on behalf of the Flink PMC)
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>
>
>

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Leonard Xu

Congratulations!  Dian Fu

Best,
Leonard

> 在 2020年1月16日，18:00，Jeff Zhang  写道：
> 
> Congrats Dian Fu ！
> 
> jincheng sun mailto:sunjincheng...@gmail.com>> 
> 于2020年1月16日周四 下午5:58写道：
> Hi everyone,
> 
> I'm very happy to announce that Dian accepted the offer of the Flink PMC to 
> become a committer of the Flink project.
> 
> Dian Fu has been contributing to Flink for many years. Dian Fu played an 
> essential role in PyFlink/CEP/SQL/Table API modules. Dian Fu has contributed 
> several major features, reported and fixed many bugs, spent a lot of time 
> reviewing pull requests and also frequently helping out on the user mailing 
> lists and check/vote the release.
>  
> Please join in me congratulating Dian for becoming a Flink committer !
> 
> Best, 
> Jincheng(on behalf of the Flink PMC)
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang

Re: [DISCUSS] Improve TableFactory

2020-01-16 Thread Timo Walther


Hi Jingsong,

+1 for adding a context in the source and sink factories. A context 
class also allows for future modifications without touching the 
TableFactory interface again.


How about:

interface TableSourceFactory {
interface Context {
   // ...
}
}

Because I find the name `CatalogTableContext` confusing and we can bound 
the interface to the factory class itself as an inner interface.


Readable access to configuration sounds also right to me. Can we remove 
the `ObjectPath getTablePath()` method? I don't see a reason why a 
factory should know the path. And if so, it should be an 
`ObjectIdentifier` instead to also know about the catalog we are using.


The `isStreamingMode()` should be renamed to `isBounded()` because we 
would like to use terminology around boundedness rather than 
streaming/batch.


@Bowen: We are in the process of unifying the APIs and thus explicitly 
avoid specialized methods in the future.


Can we postpone the change of TableValidators? I don't think that every 
factory needs a schema validator. Ideally, the factory should just 
return a List or ConfigOptionGroup that contains the 
validation logic as mentioned in the validation part of FLIP-54[1]. But 
currently our config options are not rich enough to have a unified 
validation. Additionally, the factory should return some properties such 
as "supports event-time" for the schema validation outside of the 
factory itself.


Regards,
Timo

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration 




On 16.01.20 00:51, Bowen Li wrote:

Hi Jingsong,

The 1st and 2nd pain points you described are very valid, as I'm more
familiar with them. I agree these are shortcomings of the current Flink SQL
design.

A couple comments on your 1st proposal:

1. is it better to have explicit APIs like "createBatchTableSource(...)"
and "createStreamingTableSource(...)" in TableSourceFactory (would be
similar for sink factory) to let planner handle which mode (streaming vs
batch) of source should be instantiated? That way we don't need to always
let connector developers handling an if-else on isStreamingMode.
2. I'm not sure of the benefits to have a CatalogTableContext class. The
path, table, and config are fairly independent of each other. So why not
pass the config in as 3rd parameter as `createXxxTableSource(path,
catalogTable, tableConfig)?


On Tue, Jan 14, 2020 at 7:03 PM Jingsong Li  wrote:


Hi dev,

I'd like to kick off a discussion on the improvement of TableSourceFactory
and TableSinkFactory.

Motivation:
Now the main needs and problems are:
1.Connector can't get TableConfig [1], and some behaviors really need to be
controlled by the user's table configuration. In the era of catalog, we
can't put these config in connector properties, which is too inconvenient.
2.Connector can't know if this is batch or stream execution mode. But the
sink implementation of batch and stream is totally different. I understand
there is an update mode property now, but it splits the batch and stream in
the catalog dimension. In fact, this information can be obtained through
the current TableEnvironment.
3.No interface to call validation. Now our validation is more util classes.
It depends on whether or not the connector calls. Now we have some new
validations to add, such as [2], which is really confuse uses, even
developers. Another problem is that our SQL update (DDL) does not have
validation [3]. It is better to report an error when executing DDL,
otherwise it will confuse the user.

Proposed change draft for 1 and 2:

interface CatalogTableContext {
ObjectPath getTablePath();
CatalogTable getTable();
ReadableConfig getTableConfig();
boolean isStreamingMode();
}

public interface TableSourceFactory extends TableFactory {

default TableSource createTableSource(CatalogTableContext context) {
   return createTableSource(context.getTablePath(), context.getTable());
}

..
}

Proposed change draft for 3:

public interface TableFactory {

TableValidators validators();

interface TableValidators {
   ConnectorDescriptorValidator connectorValidator();
   TableSchemaValidator schemaValidator();
   FormatDescriptorValidator formatValidator();
}
}

What do you think?

[1] https://issues.apache.org/jira/browse/FLINK-15290
[2]

http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-A-mechanism-to-validate-the-precision-of-columns-for-connectors-td36552.html#a36556
[3] https://issues.apache.org/jira/browse/FLINK-15509

Best,
Jingsong Lee

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Jeff Zhang

Congrats Dian Fu ！

jincheng sun  于2020年1月16日周四 下午5:58写道：

> Hi everyone,
>
> I'm very happy to announce that Dian accepted the offer of the Flink PMC
> to become a committer of the Flink project.
>
> Dian Fu has been contributing to Flink for many years. Dian Fu played an
> essential role in PyFlink/CEP/SQL/Table API modules. Dian Fu has
> contributed several major features, reported and fixed many bugs, spent a
> lot of time reviewing pull requests and also frequently helping out on the
> user mailing lists and check/vote the release.
>
> Please join in me congratulating Dian for becoming a Flink committer !
>
> Best,
> Jincheng(on behalf of the Flink PMC)
>


-- 
Best Regards

Jeff Zhang

[ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread jincheng sun

Hi everyone,

I'm very happy to announce that Dian accepted the offer of the Flink PMC to
become a committer of the Flink project.

Dian Fu has been contributing to Flink for many years. Dian Fu played an
essential role in PyFlink/CEP/SQL/Table API modules. Dian Fu has
contributed several major features, reported and fixed many bugs, spent a
lot of time reviewing pull requests and also frequently helping out on the
user mailing lists and check/vote the release.

Please join in me congratulating Dian for becoming a Flink committer !

Best,
Jincheng(on behalf of the Flink PMC)

[jira] [Created] (FLINK-15613) execute sql appear "java.lang.IndexOutOfBoundsException"

2020-01-16 Thread xiaojin.wy (Jira)

xiaojin.wy created FLINK-15613:
--

 Summary: execute sql appear "java.lang.IndexOutOfBoundsException"
 Key: FLINK-15613
 URL: https://issues.apache.org/jira/browse/FLINK-15613
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Client
Affects Versions: 1.10.0
 Environment: *The input data is:*

0.0
1004.30
-34.84
1.2345678901234E200
1.2345678901234E-200

*The sql-client conf is:*

execution:
 planner: blink
 type: batch
Reporter: xiaojin.wy
 Fix For: 1.10.0


*The sql is* :

CREATE TABLE `int8_tbl` (
q1 bigint, q2 bigint
) WITH (
'connector.path'='/test_join/sources/int8_tbl.csv',
'format.empty-column-as-null'='true',
'format.field-delimiter'='|',
'connector.type'='filesystem',
'format.derive-schema'='true',
'format.type'='csv'
);

select * from int8_tbl i1 left join (select * from int8_tbl i2 join (select 123 
as x) ss on i2.q1 = x) as i3 on i1.q2 = i3.q2 order by 1, 2;

 

*The output after exciting the sql is :*
[ERROR] Could not execute SQL statement. Reason:
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1





 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15612) Implement a DataTypeLookup

2020-01-16 Thread Timo Walther (Jira)

Timo Walther created FLINK-15612:


 Summary: Implement a DataTypeLookup
 Key: FLINK-15612
 URL: https://issues.apache.org/jira/browse/FLINK-15612
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API, Table SQL / Planner
Reporter: Timo Walther
Assignee: Timo Walther


Implement the {{DataTypeLookup}} interface and make it available for both in 
Table API and SQL planner. This is necessary to resolve {{RAW}} types and have 
access to the catalog for types in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

?????? Let Flink SQL PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset

2020-01-16 Thread ??????

Hi,
  What I am talking about is the `PlannerExpressionParserImpl`, 
which is written by Scala Parser tool, Every time we call 
StreamTableEnvironment#FromDataStream, the field String (or maybe scala.Symbol 
by scala Api) shall be parsed by `PlannerExpressionParserImpl` into 
`Expression`.
As we can see the parser grammar written in 
`PlannerExpressionParserImpl `, the `fieldRefrence` is defined by `*` or 
`ident`. `ident` in  `PlannerExpressionParserImpl` is 
just the one in [[scala.util.parsing.combinator.JavaTokenParsers]] 
which is JavaIdentifier.


 After discussed with Jark, I also discovered that 
`PlannerExpressionParserImpl` currrently even does not support quote ??'`'). I 
did't know what u just told me about Calcite before. But it doesn't 
matter. Well maybe we can just let 
PlannerExpressionParserImpl#FieldRefrence use Unicode as its default charset 
and support '`' for the first step, and then make the whole project 
supports Unicode charset when Calcite related part is available.




btw I have been to ur lecture in FFA Asia on Calcite, which really inspired me 
a lot~






Best Regards
??Shoi Liu










----
??:"Danny Chan"https://docs.google.com/document/d/1wo5byn_6K_YOKiPdXNav1zgzt9IBC3SbPvpPnIShtXk/edit#heading=h.g4bnumde4dl5
 
 
 
 
 Best, Danny Chan
 
 
 ?? 2020??1??15?? +0800 PM11:08 https://issues.apache.org/jira/browse/FLINK-15573
 

  As the title tells, what I do want to do is let the `FieldRefrence` use 
Unicode as its default charset (or maybe as an optional charset which can 
be configured).
 According to the `PlannerExpressionParserImpl`, currently FLINK uses 
JavaIdentifier as `FieldRefrence`??s default charset. But, from my 
perspective, it is not enough. Considering that user who uses ElasticSearch as 
sink??we all know that ES has A field called `@timestamp`, which JavaIdentifier 
cannot meet.
 

  So in my team, we just let `PlannerExpressionParserImpl#FieldRefrence` 
use Unicode as its default charset so that solves this kind of problem. (Plz 
refer to the issue I mentioned above )
 

 In my Opinion, the change shall be for general purpose:
 Firstly, Mysql supports unicode as default field charset, see the field 
named `@@`, so shall we support unicode also?

54 matches

Mail list logo