Re: [DISCUSS] Drop Savepoint Compatibility with Flink 1.2

2020-02-23 Thread Yu Li
+1 for dropping savepoint compatibility with Flink 1.2.

Best Regards,
Yu


On Sat, 22 Feb 2020 at 22:05, Ufuk Celebi  wrote:

> Hey Stephan,
>
> +1.
>
> Reading over the linked ticket and your description here, I think it makes
> a lot of sense to go ahead with this. Since it's possible to upgrade via
> intermediate Flink releases as a fail-safe I don't have any concerns.
>
> – Ufuk
>
>
> On Fri, Feb 21, 2020 at 4:34 PM Till Rohrmann 
> wrote:
> >
> > +1 for dropping savepoint compatibility with Flink 1.2.
> >
> > Cheers,
> > Till
> >
> > On Thu, Feb 20, 2020 at 6:55 PM Stephan Ewen  wrote:
> >>
> >> Thank you for the feedback.
> >>
> >> Here is the JIRA issue with some more explanation also about the
> background and implications:
> >> https://jira.apache.org/jira/browse/FLINK-16192
> >>
> >> Best,
> >> Stephan
> >>
> >>
> >> On Thu, Feb 20, 2020 at 2:26 PM vino yang 
> wrote:
> >>>
> >>> +1 for dropping Savepoint compatibility with Flink 1.2
> >>>
> >>> Flink 1.2 is quite far away from the latest 1.10. Especially after the
> release of Flink 1.9 and 1.10, the code and architecture have undergone
> major changes.
> >>>
> >>> Currently, I am updating state migration tests for Flink 1.10. I can
> still see some binary snapshot files of version 1.2. If we agree on this
> topic, we may be able to alleviate some of the burdens(remove those binary
> files) when the migration tests would be updated later.
> >>>
> >>> Best,
> >>> Vino
> >>>
> >>> Theo Diefenthal  于2020年2月20日周四
> 下午9:04写道:
> 
>  +1 for dropping compatibility.
> 
>  I personally think that it is very important for a project to keep a
> good pace in developing that old legacy stuff must be dropped from time to
> time. As long as there is an upgrade routine (via going to another flink
> release) that's fine.
> 
>  
>  Von: "Stephan Ewen" 
>  An: "dev" , "user" 
>  Gesendet: Donnerstag, 20. Februar 2020 11:11:43
>  Betreff: [DISCUSS] Drop Savepoint Compatibility with Flink 1.2
> 
>  Hi all!
>  For some cleanup and simplifications, it would be helpful to drop
> Savepoint compatibility with Flink version 1.2. That version was released
> almost three years ago.
> 
>  I would expect that no one uses that old version any more in a way
> that they actively want to upgrade directly to 1.11.
> 
>  Even if, there is still the way to first upgrade to another version
> (like 1.9) and then upgrade to 1.11 from there.
> 
>  Any concerns to drop that support?
> 
>  Best,
>  Stephan
> 
> 
>  --
>  SCOOP Software GmbH - Gut Maarhausen - Eiler Straße 3 P - D-51107 Köln
>  Theo Diefenthal
> 
>  T +49 221 801916-196 - F +49 221 801916-17 - M +49 160 90506575
>  theo.diefent...@scoop-software.de - www.scoop-software.de
>  Sitz der Gesellschaft: Köln, Handelsregister: Köln,
>  Handelsregisternummer: HRB 36625
>  Geschäftsführung: Dr. Oleg Balovnev, Frank Heinen,
>  Martin Müller-Rohde, Dr. Wolfgang Reddig, Roland Scheel
>
>>


Re: FsStateBackend vs RocksDBStateBackend

2020-02-23 Thread Yu Li
Yes FsStateBackend would be the best fit for state access performance in
this case. Just a reminder that FsStateBackend will upload the full dataset
to DFS during checkpointing, so please watch the network bandwidth usage
and make sure it won't become a new bottleneck.

Best Regards,
Yu


On Fri, 21 Feb 2020 at 20:56, Robert Metzger  wrote:

> I would try the FsStateBackend in this scenario, as you have enough memory
> available.
>
> On Thu, Jan 30, 2020 at 5:26 PM Ran Zhang  wrote:
>
>> Hi Gordon,
>>
>> Thanks for your reply! Regarding state size - we are at 200-300gb but we
>> have 120 parallelism which will make each task handle ~2 - 3 gb state.
>> (when we submit the job we are setting tm memory to 15g.) In this scenario
>> what will be the best fit for statebackend?
>>
>> Thanks,
>> Ran
>>
>> On Wed, Jan 29, 2020 at 6:37 PM Tzu-Li (Gordon) Tai 
>> wrote:
>>
>>> Hi Ran,
>>>
>>> On Thu, Jan 30, 2020 at 9:39 AM Ran Zhang 
>>> wrote:
>>>
 Hi all,

 We have a Flink app that uses a KeyedProcessFunction, and in the
 function it requires a ValueState(of TreeSet) and the processElement method
 needs to access and update it. We tried to use RocksDB as our stateBackend
 but the performance is not good, and intuitively we think it was because of
 the serialization / deserialization on each processElement call.

>>>
>>> As you have already pointed out, serialization behaviour is a major
>>> difference between the 2 state backends, and will directly impact
>>> performance due to the extra runtime overhead in RocksDB.
>>> If you plan to continue using the RocksDB state backend, make sure to
>>> use MapState instead of ValueState where possible, since every access to
>>> the ValueState in the RocksDB backend requires serializing / deserializing
>>> the whole value.
>>> For MapState, de-/serialization happens per K-V access. Whether or not
>>> this makes sense would of course depend on your state access pattern.
>>>
>>>
 Then we tried to switch to use FsStateBackend (which keeps the
 in-flight data in the TaskManager’s memory according to doc), and it could
 resolve the performance issue. *So we want to understand better what
 are the tradeoffs in choosing between these 2 stateBackend.* Our
 checkpoint size is 200 - 300 GB in stable state. For now we know one
 benefits of RocksDB is it supports incremental checkpoint, but would love
 to know what else we are losing in choosing FsStateBackend.

>>>
>>> As of now, feature-wise both backends support asynchronous snapshotting,
>>> state schema evolution, and access via the State Processor API.
>>> In the end, the major factor for deciding between the two state backends
>>> would be your expected state size.
>>> That being said, it could be possible in the future that savepoint
>>> formats for the backends are changed to be compatible, meaning that you
>>> will be able to switch between different backends upon restore [1].
>>>
>>>

 Thanks a lot!
 Ran Zhang

>>>
>>> Cheers,
>>> Gordon
>>>
>>>  [1]
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Binary+format+for+Keyed+State
>>>
>>


Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-23 Thread Yu Li
Congratulations Jingsong! Well deserved.

Best Regards,
Yu


On Mon, 24 Feb 2020 at 14:10, Congxian Qiu  wrote:

> Congratulations Jingsong!
>
> Best,
> Congxian
>
>
> jincheng sun  于2020年2月24日周一 下午1:38写道:
>
>> Congratulations Jingsong!
>>
>> Best,
>> Jincheng
>>
>>
>> Zhu Zhu  于2020年2月24日周一 上午11:55写道:
>>
>>> Congratulations Jingsong!
>>>
>>> Thanks,
>>> Zhu Zhu
>>>
>>> Fabian Hueske  于2020年2月22日周六 上午1:30写道:
>>>
 Congrats Jingsong!

 Cheers, Fabian

 Am Fr., 21. Feb. 2020 um 17:49 Uhr schrieb Rong Rong <
 walter...@gmail.com>:

 > Congratulations Jingsong!!
 >
 > Cheers,
 > Rong
 >
 > On Fri, Feb 21, 2020 at 8:45 AM Bowen Li  wrote:
 >
 > > Congrats, Jingsong!
 > >
 > > On Fri, Feb 21, 2020 at 7:28 AM Till Rohrmann >>> >
 > > wrote:
 > >
 > >> Congratulations Jingsong!
 > >>
 > >> Cheers,
 > >> Till
 > >>
 > >> On Fri, Feb 21, 2020 at 4:03 PM Yun Gao 
 wrote:
 > >>
 > >>>   Congratulations Jingsong!
 > >>>
 > >>>Best,
 > >>>Yun
 > >>>
 > >>> --
 > >>> From:Jingsong Li 
 > >>> Send Time:2020 Feb. 21 (Fri.) 21:42
 > >>> To:Hequn Cheng 
 > >>> Cc:Yang Wang ; Zhijiang <
 > >>> wangzhijiang...@aliyun.com>; Zhenghua Gao ;
 godfrey
 > >>> he ; dev ; user <
 > >>> user@flink.apache.org>
 > >>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
 > >>>
 > >>> Thanks everyone~
 > >>>
 > >>> It's my pleasure to be part of the community. I hope I can make a
 > better
 > >>> contribution in future.
 > >>>
 > >>> Best,
 > >>> Jingsong Lee
 > >>>
 > >>> On Fri, Feb 21, 2020 at 2:48 PM Hequn Cheng 
 wrote:
 > >>> Congratulations Jingsong! Well deserved.
 > >>>
 > >>> Best,
 > >>> Hequn
 > >>>
 > >>> On Fri, Feb 21, 2020 at 2:42 PM Yang Wang 
 > wrote:
 > >>> Congratulations!Jingsong. Well deserved.
 > >>>
 > >>>
 > >>> Best,
 > >>> Yang
 > >>>
 > >>> Zhijiang  于2020年2月21日周五 下午1:18写道:
 > >>> Congrats Jingsong! Welcome on board!
 > >>>
 > >>> Best,
 > >>> Zhijiang
 > >>>
 > >>> --
 > >>> From:Zhenghua Gao 
 > >>> Send Time:2020 Feb. 21 (Fri.) 12:49
 > >>> To:godfrey he 
 > >>> Cc:dev ; user 
 > >>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
 > >>>
 > >>> Congrats Jingsong!
 > >>>
 > >>>
 > >>> *Best Regards,*
 > >>> *Zhenghua Gao*
 > >>>
 > >>>
 > >>> On Fri, Feb 21, 2020 at 11:59 AM godfrey he 
 > wrote:
 > >>> Congrats Jingsong! Well deserved.
 > >>>
 > >>> Best,
 > >>> godfrey
 > >>>
 > >>> Jeff Zhang  于2020年2月21日周五 上午11:49写道:
 > >>> Congratulations!Jingsong. You deserve it
 > >>>
 > >>> wenlong.lwl  于2020年2月21日周五 上午11:43写道:
 > >>> Congrats Jingsong!
 > >>>
 > >>> On Fri, 21 Feb 2020 at 11:41, Dian Fu 
 wrote:
 > >>>
 > >>> > Congrats Jingsong!
 > >>> >
 > >>> > > 在 2020年2月21日,上午11:39,Jark Wu  写道:
 > >>> > >
 > >>> > > Congratulations Jingsong! Well deserved.
 > >>> > >
 > >>> > > Best,
 > >>> > > Jark
 > >>> > >
 > >>> > > On Fri, 21 Feb 2020 at 11:32, zoudan  wrote:
 > >>> > >
 > >>> > >> Congratulations! Jingsong
 > >>> > >>
 > >>> > >>
 > >>> > >> Best,
 > >>> > >> Dan Zou
 > >>> > >>
 > >>> >
 > >>> >
 > >>>
 > >>>
 > >>> --
 > >>> Best Regards
 > >>>
 > >>> Jeff Zhang
 > >>>
 > >>>
 > >>>
 > >>> --
 > >>> Best, Jingsong Lee
 > >>>
 > >>>
 > >>>
 >

>>>


Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-23 Thread Congxian Qiu
Congratulations Jingsong!

Best,
Congxian


jincheng sun  于2020年2月24日周一 下午1:38写道:

> Congratulations Jingsong!
>
> Best,
> Jincheng
>
>
> Zhu Zhu  于2020年2月24日周一 上午11:55写道:
>
>> Congratulations Jingsong!
>>
>> Thanks,
>> Zhu Zhu
>>
>> Fabian Hueske  于2020年2月22日周六 上午1:30写道:
>>
>>> Congrats Jingsong!
>>>
>>> Cheers, Fabian
>>>
>>> Am Fr., 21. Feb. 2020 um 17:49 Uhr schrieb Rong Rong <
>>> walter...@gmail.com>:
>>>
>>> > Congratulations Jingsong!!
>>> >
>>> > Cheers,
>>> > Rong
>>> >
>>> > On Fri, Feb 21, 2020 at 8:45 AM Bowen Li  wrote:
>>> >
>>> > > Congrats, Jingsong!
>>> > >
>>> > > On Fri, Feb 21, 2020 at 7:28 AM Till Rohrmann 
>>> > > wrote:
>>> > >
>>> > >> Congratulations Jingsong!
>>> > >>
>>> > >> Cheers,
>>> > >> Till
>>> > >>
>>> > >> On Fri, Feb 21, 2020 at 4:03 PM Yun Gao 
>>> wrote:
>>> > >>
>>> > >>>   Congratulations Jingsong!
>>> > >>>
>>> > >>>Best,
>>> > >>>Yun
>>> > >>>
>>> > >>> --
>>> > >>> From:Jingsong Li 
>>> > >>> Send Time:2020 Feb. 21 (Fri.) 21:42
>>> > >>> To:Hequn Cheng 
>>> > >>> Cc:Yang Wang ; Zhijiang <
>>> > >>> wangzhijiang...@aliyun.com>; Zhenghua Gao ;
>>> godfrey
>>> > >>> he ; dev ; user <
>>> > >>> user@flink.apache.org>
>>> > >>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
>>> > >>>
>>> > >>> Thanks everyone~
>>> > >>>
>>> > >>> It's my pleasure to be part of the community. I hope I can make a
>>> > better
>>> > >>> contribution in future.
>>> > >>>
>>> > >>> Best,
>>> > >>> Jingsong Lee
>>> > >>>
>>> > >>> On Fri, Feb 21, 2020 at 2:48 PM Hequn Cheng 
>>> wrote:
>>> > >>> Congratulations Jingsong! Well deserved.
>>> > >>>
>>> > >>> Best,
>>> > >>> Hequn
>>> > >>>
>>> > >>> On Fri, Feb 21, 2020 at 2:42 PM Yang Wang 
>>> > wrote:
>>> > >>> Congratulations!Jingsong. Well deserved.
>>> > >>>
>>> > >>>
>>> > >>> Best,
>>> > >>> Yang
>>> > >>>
>>> > >>> Zhijiang  于2020年2月21日周五 下午1:18写道:
>>> > >>> Congrats Jingsong! Welcome on board!
>>> > >>>
>>> > >>> Best,
>>> > >>> Zhijiang
>>> > >>>
>>> > >>> --
>>> > >>> From:Zhenghua Gao 
>>> > >>> Send Time:2020 Feb. 21 (Fri.) 12:49
>>> > >>> To:godfrey he 
>>> > >>> Cc:dev ; user 
>>> > >>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
>>> > >>>
>>> > >>> Congrats Jingsong!
>>> > >>>
>>> > >>>
>>> > >>> *Best Regards,*
>>> > >>> *Zhenghua Gao*
>>> > >>>
>>> > >>>
>>> > >>> On Fri, Feb 21, 2020 at 11:59 AM godfrey he 
>>> > wrote:
>>> > >>> Congrats Jingsong! Well deserved.
>>> > >>>
>>> > >>> Best,
>>> > >>> godfrey
>>> > >>>
>>> > >>> Jeff Zhang  于2020年2月21日周五 上午11:49写道:
>>> > >>> Congratulations!Jingsong. You deserve it
>>> > >>>
>>> > >>> wenlong.lwl  于2020年2月21日周五 上午11:43写道:
>>> > >>> Congrats Jingsong!
>>> > >>>
>>> > >>> On Fri, 21 Feb 2020 at 11:41, Dian Fu 
>>> wrote:
>>> > >>>
>>> > >>> > Congrats Jingsong!
>>> > >>> >
>>> > >>> > > 在 2020年2月21日,上午11:39,Jark Wu  写道:
>>> > >>> > >
>>> > >>> > > Congratulations Jingsong! Well deserved.
>>> > >>> > >
>>> > >>> > > Best,
>>> > >>> > > Jark
>>> > >>> > >
>>> > >>> > > On Fri, 21 Feb 2020 at 11:32, zoudan  wrote:
>>> > >>> > >
>>> > >>> > >> Congratulations! Jingsong
>>> > >>> > >>
>>> > >>> > >>
>>> > >>> > >> Best,
>>> > >>> > >> Dan Zou
>>> > >>> > >>
>>> > >>> >
>>> > >>> >
>>> > >>>
>>> > >>>
>>> > >>> --
>>> > >>> Best Regards
>>> > >>>
>>> > >>> Jeff Zhang
>>> > >>>
>>> > >>>
>>> > >>>
>>> > >>> --
>>> > >>> Best, Jingsong Lee
>>> > >>>
>>> > >>>
>>> > >>>
>>> >
>>>
>>


Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-23 Thread jincheng sun
Congratulations Jingsong!

Best,
Jincheng


Zhu Zhu  于2020年2月24日周一 上午11:55写道:

> Congratulations Jingsong!
>
> Thanks,
> Zhu Zhu
>
> Fabian Hueske  于2020年2月22日周六 上午1:30写道:
>
>> Congrats Jingsong!
>>
>> Cheers, Fabian
>>
>> Am Fr., 21. Feb. 2020 um 17:49 Uhr schrieb Rong Rong > >:
>>
>> > Congratulations Jingsong!!
>> >
>> > Cheers,
>> > Rong
>> >
>> > On Fri, Feb 21, 2020 at 8:45 AM Bowen Li  wrote:
>> >
>> > > Congrats, Jingsong!
>> > >
>> > > On Fri, Feb 21, 2020 at 7:28 AM Till Rohrmann 
>> > > wrote:
>> > >
>> > >> Congratulations Jingsong!
>> > >>
>> > >> Cheers,
>> > >> Till
>> > >>
>> > >> On Fri, Feb 21, 2020 at 4:03 PM Yun Gao 
>> wrote:
>> > >>
>> > >>>   Congratulations Jingsong!
>> > >>>
>> > >>>Best,
>> > >>>Yun
>> > >>>
>> > >>> --
>> > >>> From:Jingsong Li 
>> > >>> Send Time:2020 Feb. 21 (Fri.) 21:42
>> > >>> To:Hequn Cheng 
>> > >>> Cc:Yang Wang ; Zhijiang <
>> > >>> wangzhijiang...@aliyun.com>; Zhenghua Gao ;
>> godfrey
>> > >>> he ; dev ; user <
>> > >>> user@flink.apache.org>
>> > >>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
>> > >>>
>> > >>> Thanks everyone~
>> > >>>
>> > >>> It's my pleasure to be part of the community. I hope I can make a
>> > better
>> > >>> contribution in future.
>> > >>>
>> > >>> Best,
>> > >>> Jingsong Lee
>> > >>>
>> > >>> On Fri, Feb 21, 2020 at 2:48 PM Hequn Cheng 
>> wrote:
>> > >>> Congratulations Jingsong! Well deserved.
>> > >>>
>> > >>> Best,
>> > >>> Hequn
>> > >>>
>> > >>> On Fri, Feb 21, 2020 at 2:42 PM Yang Wang 
>> > wrote:
>> > >>> Congratulations!Jingsong. Well deserved.
>> > >>>
>> > >>>
>> > >>> Best,
>> > >>> Yang
>> > >>>
>> > >>> Zhijiang  于2020年2月21日周五 下午1:18写道:
>> > >>> Congrats Jingsong! Welcome on board!
>> > >>>
>> > >>> Best,
>> > >>> Zhijiang
>> > >>>
>> > >>> --
>> > >>> From:Zhenghua Gao 
>> > >>> Send Time:2020 Feb. 21 (Fri.) 12:49
>> > >>> To:godfrey he 
>> > >>> Cc:dev ; user 
>> > >>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
>> > >>>
>> > >>> Congrats Jingsong!
>> > >>>
>> > >>>
>> > >>> *Best Regards,*
>> > >>> *Zhenghua Gao*
>> > >>>
>> > >>>
>> > >>> On Fri, Feb 21, 2020 at 11:59 AM godfrey he 
>> > wrote:
>> > >>> Congrats Jingsong! Well deserved.
>> > >>>
>> > >>> Best,
>> > >>> godfrey
>> > >>>
>> > >>> Jeff Zhang  于2020年2月21日周五 上午11:49写道:
>> > >>> Congratulations!Jingsong. You deserve it
>> > >>>
>> > >>> wenlong.lwl  于2020年2月21日周五 上午11:43写道:
>> > >>> Congrats Jingsong!
>> > >>>
>> > >>> On Fri, 21 Feb 2020 at 11:41, Dian Fu 
>> wrote:
>> > >>>
>> > >>> > Congrats Jingsong!
>> > >>> >
>> > >>> > > 在 2020年2月21日,上午11:39,Jark Wu  写道:
>> > >>> > >
>> > >>> > > Congratulations Jingsong! Well deserved.
>> > >>> > >
>> > >>> > > Best,
>> > >>> > > Jark
>> > >>> > >
>> > >>> > > On Fri, 21 Feb 2020 at 11:32, zoudan  wrote:
>> > >>> > >
>> > >>> > >> Congratulations! Jingsong
>> > >>> > >>
>> > >>> > >>
>> > >>> > >> Best,
>> > >>> > >> Dan Zou
>> > >>> > >>
>> > >>> >
>> > >>> >
>> > >>>
>> > >>>
>> > >>> --
>> > >>> Best Regards
>> > >>>
>> > >>> Jeff Zhang
>> > >>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>> Best, Jingsong Lee
>> > >>>
>> > >>>
>> > >>>
>> >
>>
>


Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-23 Thread Zhu Zhu
Congratulations Jingsong!

Thanks,
Zhu Zhu

Fabian Hueske  于2020年2月22日周六 上午1:30写道:

> Congrats Jingsong!
>
> Cheers, Fabian
>
> Am Fr., 21. Feb. 2020 um 17:49 Uhr schrieb Rong Rong  >:
>
> > Congratulations Jingsong!!
> >
> > Cheers,
> > Rong
> >
> > On Fri, Feb 21, 2020 at 8:45 AM Bowen Li  wrote:
> >
> > > Congrats, Jingsong!
> > >
> > > On Fri, Feb 21, 2020 at 7:28 AM Till Rohrmann 
> > > wrote:
> > >
> > >> Congratulations Jingsong!
> > >>
> > >> Cheers,
> > >> Till
> > >>
> > >> On Fri, Feb 21, 2020 at 4:03 PM Yun Gao  wrote:
> > >>
> > >>>   Congratulations Jingsong!
> > >>>
> > >>>Best,
> > >>>Yun
> > >>>
> > >>> --
> > >>> From:Jingsong Li 
> > >>> Send Time:2020 Feb. 21 (Fri.) 21:42
> > >>> To:Hequn Cheng 
> > >>> Cc:Yang Wang ; Zhijiang <
> > >>> wangzhijiang...@aliyun.com>; Zhenghua Gao ;
> godfrey
> > >>> he ; dev ; user <
> > >>> user@flink.apache.org>
> > >>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
> > >>>
> > >>> Thanks everyone~
> > >>>
> > >>> It's my pleasure to be part of the community. I hope I can make a
> > better
> > >>> contribution in future.
> > >>>
> > >>> Best,
> > >>> Jingsong Lee
> > >>>
> > >>> On Fri, Feb 21, 2020 at 2:48 PM Hequn Cheng 
> wrote:
> > >>> Congratulations Jingsong! Well deserved.
> > >>>
> > >>> Best,
> > >>> Hequn
> > >>>
> > >>> On Fri, Feb 21, 2020 at 2:42 PM Yang Wang 
> > wrote:
> > >>> Congratulations!Jingsong. Well deserved.
> > >>>
> > >>>
> > >>> Best,
> > >>> Yang
> > >>>
> > >>> Zhijiang  于2020年2月21日周五 下午1:18写道:
> > >>> Congrats Jingsong! Welcome on board!
> > >>>
> > >>> Best,
> > >>> Zhijiang
> > >>>
> > >>> --
> > >>> From:Zhenghua Gao 
> > >>> Send Time:2020 Feb. 21 (Fri.) 12:49
> > >>> To:godfrey he 
> > >>> Cc:dev ; user 
> > >>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
> > >>>
> > >>> Congrats Jingsong!
> > >>>
> > >>>
> > >>> *Best Regards,*
> > >>> *Zhenghua Gao*
> > >>>
> > >>>
> > >>> On Fri, Feb 21, 2020 at 11:59 AM godfrey he 
> > wrote:
> > >>> Congrats Jingsong! Well deserved.
> > >>>
> > >>> Best,
> > >>> godfrey
> > >>>
> > >>> Jeff Zhang  于2020年2月21日周五 上午11:49写道:
> > >>> Congratulations!Jingsong. You deserve it
> > >>>
> > >>> wenlong.lwl  于2020年2月21日周五 上午11:43写道:
> > >>> Congrats Jingsong!
> > >>>
> > >>> On Fri, 21 Feb 2020 at 11:41, Dian Fu  wrote:
> > >>>
> > >>> > Congrats Jingsong!
> > >>> >
> > >>> > > 在 2020年2月21日,上午11:39,Jark Wu  写道:
> > >>> > >
> > >>> > > Congratulations Jingsong! Well deserved.
> > >>> > >
> > >>> > > Best,
> > >>> > > Jark
> > >>> > >
> > >>> > > On Fri, 21 Feb 2020 at 11:32, zoudan  wrote:
> > >>> > >
> > >>> > >> Congratulations! Jingsong
> > >>> > >>
> > >>> > >>
> > >>> > >> Best,
> > >>> > >> Dan Zou
> > >>> > >>
> > >>> >
> > >>> >
> > >>>
> > >>>
> > >>> --
> > >>> Best Regards
> > >>>
> > >>> Jeff Zhang
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Best, Jingsong Lee
> > >>>
> > >>>
> > >>>
> >
>


Re: Flink on Kubernetes - Session vs Job cluster mode and storage

2020-02-23 Thread Yang Wang
Hi Singh,

Glad to hear that you are looking to run Flink on the Kubernetes. I am
trying to answer your question based on my limited knowledge and
others could correct me and add some more supplements.

I think the biggest difference between session cluster and per-job cluster
on Kubernetesis the isolation. Since for per-job, a dedicated Flink cluster
will be started for the only one job and no any other jobs could be
submitted.
Once the job is finished, then the Flink cluster will be
destroyed immediately.
The second point is one-step submission. You do not need to start a Flink
cluster first and then submit a job to the existing session.

> Are there any benefits with regards to
1. Configuring the jobs
No matter you are using the per-job cluster or submitting to the existing
session cluster, they share the configuration mechanism. You do not have
to change any codes and configurations.

2. Scaling the taskmanager
Since you are using the Standalone cluster on Kubernetes, it do not provide
an active resourcemanager. You need to use external tools to monitor and
scale up the taskmanagers. The active integration is still evolving and you
could have a taste[1].

3. Restarting jobs
For the session cluster, you could directly cancel the job and re-submit.
And
for per-job cluster, when the job is canceled, you need to start a new
per-job
cluster from the latest savepoint.

4. Managing the flink jobs
The rest api and flink command line could be used to managing the jobs(e.g.
flink cancel, etc.). I think there is no difference for session and per-job
here.

5. Passing credentials (in case of AWS, etc)
I am not sure how do you provide your credentials. If you put them in the
config map and then mount into the jobmanager/taskmanager pod, then both
session and per-job could support this way.

6. Fault tolerence and recovery of jobs from failure
For session cluster, if one taskmanager crashed, then all the jobs which
have tasks
on this taskmanager will failed.
Both session and per-job could be configured with high availability and
recover
from the latest checkpoint.

> Is there any need for specifying volume for the pods?
No, you do not need to specify a volume for pod. All the data in the pod
local directory is temporary. When a pod crashed and relaunched, the
taskmanager will retrieve the checkpoint from zookeeper + S3 and resume
from the latest checkpoint.


[1].
https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html

M Singh  于2020年2月23日周日 上午2:28写道:

> Hey Folks:
>
> I am trying to figure out the options for running Flink on Kubernetes and
> am trying to find out the pros and cons of running in Flink Session vs
> Flink Cluster mode (
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#flink-session-cluster-on-kubernetes
> ).
>
> I understand that in job mode there is no need to submit the job since it
> is part of the job image.  But what are other the pros and cons of this
> approach vs session mode where a job manager is deployed and flink jobs can
> be submitted it ?  Are there any benefits with regards to:
>
> 1. Configuring the jobs
> 2. Scaling the taskmanager
> 3. Restarting jobs
> 4. Managing the flink jobs
> 5. Passing credentials (in case of AWS, etc)
> 6. Fault tolerence and recovery of jobs from failure
>
> Also, we will be keeping the checkpoints for the jobs on S3.  Is there any
> need for specifying volume for the pods ?  If volume is required do we need
> provisioned volume and what are the recommended alternatives/considerations
> especially with AWS.
>
> If there are any other considerations, please let me know.
>
> Thanks for your advice.
>
>
>
>
>


Flink读写kafka数据聚集任务失败问题

2020-02-23 Thread chanamper
大家好,请教一下,flink任务读取kafka数据进行聚集操作后将结果写回kafka,flink版本为1.8.0。任务运行一段时间后出现如下异常,之后flink任务异常挂掉,请问一下这个问题该如何解决呢?多谢

2020-02-19 10:45:45,314 ERROR 
org.apache.flink.runtime.io.network.netty.PartitionRequestQueue  - Encountered 
error while consuming partitions
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at 
org.apache.flink.shaded.netty4.io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
at 
org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1132)
at 
org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:347)
at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148)
at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656)
at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:591)
at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:508)
at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470)
at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909)
at java.lang.Thread.run(Thread.java:748)
2020-02-19 10:45:45,317 INFO  org.apache.kafka.clients.producer.KafkaProducer   
- [Producer clientId=producer-1] Closing the Kafka producer with 
timeoutMillis = 9223372036854775807 ms.
2020-02-19 10:45:45,412 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner   - Memory usage 
stats: [HEAP: 98/6912/6912 MB, NON HEAP: 81/83/-1 MB (used/committed/max)]

2020-02-19 10:45:45,413 INFO  
org.apache.flink.runtime.taskexecutor.TaskManagerRunner   - Direct memory 
stats: Count: 24596, Total Capacity: 806956211, Used Memory: 806956212






2020-02-19 10:50:31,351 WARN  akka.remote.transport.netty.NettyTransport
- Remote connection to [null] failed with 
java.net.ConnectException: Connection refused: aj-flinknode01/9.186.36.80:56983
2020-02-19 10:50:31,351 WARN  akka.remote.ReliableDeliverySupervisor
- Association with remote system 
[akka.tcp://flink@aj-flinknode01:56983] has failed, address is now gated for 
[50] ms. Reason: [Association failed with 
[akka.tcp://flink@aj-flinknode01:56983]] Caused by: [Connection refused: 
aj-flinknode01/9.186.36.80:56983]
2020-02-19 10:50:55,419 WARN  akka.remote.ReliableDeliverySupervisor
- Association with remote system 
[akka.tcp://flink@aj-flinknode01:45703] has failed, address is now gated for 
[50] ms. Reason: [Disassociated]
2020-02-19 10:50:56,370 INFO  org.apache.flink.yarn.YarnResourceManager 
- Closing TaskExecutor connection 
container_1578492316659_0830_01_06 because: Container 
[pid=30031,containerID=container_1578492316659_0830_01_06] is running 
beyond physical memory limits. Current usage: 10.0 GB of 10 GB physical memory 
used; 11.8 GB of 21 GB virtual memory used. Killing container.
Dump of the process-tree for container_1578492316659_0830_01_06 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 30068 30031 30031 30031 (java) 277668 18972 12626370560 2630988 
/data/jdk1.8.0_211/bin/java -Xms6912m -Xmx6912m -XX:MaxDirectMemorySize=3328m 
-XX:+UseG1GC 
-Dlog.file=/data/hadoop-2.6.0-cdh5.16.1/logs/userlogs/application_1578492316659_0830/container_1578492316659_0830_01_06/taskmanager.log
 -Dlogback.configurationFile=file:./logback.xml 
-Dlog4j.configuration=file:./log4j.properties 
org.apache.flink.yarn.YarnTaskExecutorRunner --configDir .
|- 30031 30027 30031 30031 (bash) 0 0 11001856 329 /bin/bash -c 
/data/jdk1.8.0_211/bin/java -Xms6912m -Xmx6912m -XX:MaxDirectMemorySize=3328m 
-XX:+UseG1GC 
-Dlog.file=/data/hadoop-2.6.0-cdh5.16.1/logs/userlogs/application_1578492316659_0830/container_1578492316659_0830_01_06/taskmanager.log
 -Dlogback.configurationFile=file:./logback.xml 
-Dlog4j.configuration=file:./log4j.properties 
org.apache.flink.yarn.YarnTaskExecutorRunner --configDir . 1> 
/data/hadoop-2.6.0-cdh5.16.1/logs/userlogs/application_1578492316659_0830/container_1578492316659_0830_01_06/taskmanager.out
 2> 
/data/hadoop-2.6.0-cdh5.16.1/logs/userlogs/application_1578492316659_0830/container_1578492316659_0830_01_06/taskmanager.err

Re: yarn session: one JVM per task

2020-02-23 Thread Xintong Song
Hi David,

In general, I don't think you can control all parallel subtasks of a
certain task run in the same JVM process with the current Flink.

If you job scale is very small, one thing you might try is to have only one
task manager in the Flink session cluster. You need to make sure the task
manager has enough cpu/memory resources and slots for running your job.

Thank you~

Xintong Song



On Sun, Feb 23, 2020 at 1:11 AM David Morin 
wrote:

> Hi,
> My app is based on a lib that is not thread safe (yet...).
> In waiting of the patch has been pushed, how can I be sure that my Sink
> that uses this lib is in one JVM ?
> Context: I use one Yarn session and send my Flink jobs to this session
>
> Regards,
> David
>


[ANNOUNCE] Weekly Community Update 2020/07

2020-02-23 Thread Konstantin Knauf
Dear community,

happy to share this week's community digest with updates on the next
release cycle, a set of proposal for Flink's web user interface, a couple
of discussions around our development process and a bit more.

Flink Development
==

* [releases] Stephan proposes an "anticipated feature freeze date" for
Flink 1.11 around the end of April, and hence a release in May. This would
make the next release a short one, which seems to be generally well
received. Piotr and Zhijiang will be our release managers for Flink 1.11.
[1]

* [releases] flink-shaded 10.0 has been released. [2]

* [web ui] Yadong has split up the improvement proposal to Flink's web user
interface (FLIP-75) into multiple votes. Each thread contains a live demo
of the proposed feature.
  * [FLIP-98]: Highlight backpressure task in job graph [3]
  * [FLIP-99]: Make maximally shown exceptions configurable [4]
  * [FLIP-100] Show the attempt history of tasks [5]
  * [FLIP-101] Expose Information on Pending Slots in the Web User
Interface [6]
  * [FLIP-102] More Taskmanager metrics, in particular memory resources [7]
  * [FLIP-103] List all log files and make them downloadable [8]
  * [FLIP-104] More Flink Master metrics, in particular memory resources [9]

* [development process] Hequn Cheng has started discussion on improving the
FLIP process, in particular the way we deal with discussion on the wiki,
mailing list and Google Docs. The discussion is ongoing, but there seems to
be consensus not to use Google Docs more than right now, but to maybe even
eliminate it from the process all together. In the discussion David raised
a related point, that the FLIP document is often not updated after its
implementation and changes between the original plan and the final
implementation are not documented anywhere. [10]

* [development process] Xingtong proposes to update the PR description
template as it is quite outdated. The discussion shows that the template is
generally considered useful, but it indeed needs an update. [11]

* [deployment] Aljoscha has started a discussion to either drop or extend
the support for Flink's windows scripts. [12]

* [connectors] The connector for ElasticSearch 2.x will be dropped in Flink
1.11. Elastic Search 5.x will still be supported in Flink 1.11 and
revisited for Flink 1.12. [13]

* [state] Stephan proposes to drop savepoint compatibility with Flink 1.2.
A stateful upgrade from Flink 1.2 to Flink 1.11 would still be possible by
first upgrading to an intermediate Flink version. [14]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Kicking-off-the-1-11-release-cycle-tp37817.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-shaded-10-0-released-tp37815.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-98-Better-Back-Pressure-Detection-tp37893.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-99-Make-Max-Exception-Configurable-tp37895.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-100-Add-Attempt-Information-tp37896.html
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-101-Add-Pending-Slots-Detail-tp37897.html
[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-102-Add-More-Metrics-to-TaskManager-tp37898.html
[8]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-103-Better-TM-JM-Log-Display-tp37899.html
[9]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-104-Add-More-Metrics-to-Jobmanager-tp37901.html
[10]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvements-on-FLIP-Process-tp37785.html
[11]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discuss-Update-the-pull-request-description-template-tp37755.html
[12]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Extend-or-maintain-shell-script-support-for-Windows-tp37868.html
[13]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Drop-connectors-for-Elasticsearch-2-x-and-5-x-tp37471.html
[14]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Drop-Savepoint-Compatibility-with-Flink-1-2-tp37908.html

Notable Bugs
==

* [FLINK-16111] [1.10] Flink's Kubernetes deployment does not respect the
"taskmanager.cpu.cores" configuration [15]
* [FLINK-16115] [1.10] The OSS (Alibaba Cloud's Object Storage Service)
filesystem does not work as a plugin. [16]

[15] https://issues.apache.org/jira/browse/FLINK-16111
[16] https://issues.apache.org/jira/browse/FLINK-16115

Events, Blog Posts, Misc
===

* Jingsong Lee is now an Apache Flink committer. Congratulations! [17]

* Seth has published a post on the Apache Flink blog on Flink SQL DDL: "No
Java Required: Configured Sources and Sinks in SQL" [18]

* Upcoming Meetups
* On February 26th, Prateep Kumar will host an online 

Re: timestamp问题

2020-02-23 Thread Jark Wu
Hi Fei,

Kafka source/sink 不支持 TIMESTAMP(6) 类型,支持精度3,且现在 TIMESTAMP 不带精度默认是6,所以需要你将
DDL 声明中的 TIMESTAMP 改成 TIMESTAMP(3).

Beest,
Jark

On Sun, 23 Feb 2020 at 15:44, Fei Han 
wrote:

>
> Hi,all:
>我在zeppelin执行如下DDL和SQL,报如下错误:
>   DDL:
> DROP TABLE IF EXISTS user_log ;
> CREATE TABLE user_log (
> user_id VARCHAR,
> item_id VARCHAR,
> category_id VARCHAR,
> behavior VARCHAR,
> ts TIMESTAMP
> ) WITH (
>   'connector.type' = 'kafka',
>   'connector.version' = 'universal',
>   'connector.topic' = 'ods',
>   'connector.startup-mode' = 'earliest-offset',
>   'connector.properties.zookeeper.connect'=
> 'fdw1:2181,fdw2:2181,fdw3:2181',
>   'connector.properties.bootstrap.servers' =
> 'fdw1:9092,fdw2:9092,fdww3:9092',
>   'connector.properties.group.id' = 'testGroup',
>   'connector.startup-mode' = 'earliest-offset',
>   'format.type' = 'json',
>   'update-mode' = 'append',
>   'format.derive-schema' = 'true');
>
> DROP TABLE IF EXISTS pvuv_sinks ;
> CREATE TABLE pvuv_sinks (
> dt VARCHAR,
> pv BIGINT,
> uv BIGINT
> ) WITH (
> 'connector.type' = 'jdbc', -- 使用 jdbc connector
> 'connector.url' = 'jdbc:mysql://fdw1:3306/flink', -- jdbc url
> 'connector.table' = 'pvuv_sinks', -- 表名
> 'connector.username' = 'flink', -- 用户名
> 'connector.password' = 'flink', -- 密码
> 'connector.write.flush.max-rows' = '5' -- 默认5000条,为了演示改为1条
> )
> SQL:
>  INSERT INTO pvuv_sinks
> SELECT
>   DATE_FORMAT(ts, '-MM-dd HH:00') dt,
>   COUNT(*) AS pv,
>   COUNT(DISTINCT user_id) AS uv
> FROM user_log
> GROUP BY DATE_FORMAT(ts, '-MM-dd HH:00');
>
> 报错:
>  org.apache.flink.table.api.ValidationException: Type TIMESTAMP(6) of
> table field 'ts' does not match with the physical type TIMESTAMP(3) of the
> 'ts' field of the TableSource return type.
>  at
> org.apache.flink.table.utils.TypeMappingUtils.lambda$checkPhysicalLogicalTypeCompatible$4(TypeMappingUtils.java:164)
>  at
> org.apache.flink.table.utils.TypeMappingUtils$1.defaultMethod(TypeMappingUtils.java:277)
>  at
> org.apache.flink.table.utils.TypeMappingUtils$1.defaultMethod(TypeMappingUtils.java:254)
>  at
> org.apache.flink.table.types.logical.utils.LogicalTypeDefaultVisitor.visit(LogicalTypeDefaultVisitor.java:132)
>  at
> org.apache.flink.table.types.logical.TimestampType.accept(TimestampType.java:151)
>  at
> org.apache.flink.table.utils.TypeMappingUtils.checkIfCompatible(TypeMappingUtils.java:254)
>  at
> org.apache.flink.table.utils.TypeMappingUtils.checkPhysicalLogicalTypeCompatible(TypeMappingUtils.java:160)
>  at
> org.apache.flink.table.utils.TypeMappingUtils.lambda$computeInCompositeType$8(TypeMappingUtils.java:232)
>  at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321)
>  at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
>  at
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
>  at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>  at
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>  at
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>  at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>  at
> org.apache.flink.table.utils.TypeMappingUtils.computeInCompositeType(TypeMappingUtils.java:214)
>  at
> org.apache.flink.table.utils.TypeMappingUtils.computePhysicalIndices(TypeMappingUtils.java:192)
>  at
> org.apache.flink.table.utils.TypeMappingUtils.computePhysicalIndicesOrTimeAttributeMarkers(TypeMappingUtils.java:112)
>  at
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecTableSourceScan.computeIndexMapping(StreamExecTableSourceScan.scala:212)
>  at
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecTableSourceScan.translateToPlanInternal(StreamExecTableSourceScan.scala:107)
>  at
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecTableSourceScan.translateToPlanInternal(StreamExecTableSourceScan.scala:62)
>  at
> org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:58)
>  at
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecTableSourceScan.translateToPlan(StreamExecTableSourceScan.scala:62)
>  at
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:54)
>  at
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalc.translateToPlanInternal(StreamExecCalc.scala:39)
>  at
> org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:58)
>  at
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalcBase.translateToPlan(StreamExecCalcBase.scala:38)
>  at
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecExchange.translateToPlanInternal(StreamExecExchange.scala:84)
>  at
> 

Re: 如果有些map阶段的计算很慢,它发checkpoint也很慢,那么这样会阻塞reduce operator进行后续的操作吗

2020-02-23 Thread Jark Wu
Hi Mark,

> taskA1会继续处理cp2的数据吗?如果是继续处理,taskB会处理taskA传递给taskB的cp2的数据吗?
A1会继续处理。如果是 exactly-once 模式,taskB 不会处理 taskA传递给taskB的cp2的数据。所以,如果 A2
非常非常慢,最终 taskB 会反压到 A1,导致 A1也无法继续处理数据。

> 同样的问题,如果taskA本身就是一个reduce操作(keyby),taskB是一个map操作。那么同样的问题,答案是一样的吗?
答案一样。

Best,
Jark

On Sun, 23 Feb 2020 at 19:18, Mark Zang  wrote:

> 假设一个简单的map和reduce操作。A是Map Operator,B是keyby Operator。
> A有两个task:taskA1和taskA2,B只有一个taskB
>
> 如果taskA2执行的特别慢,taskA1执行完毕checkpoint
> cp1后,告诉了taskB,然后已经开始(或者说可以开始)处理下一个checkpoint cp2的数据了。
>
> 这时候taskA2还在缓慢的处理cp1的数据。这时候:
>
> taskA1会继续处理cp2的数据吗?
> 如果是继续处理,taskB会处理taskA传递给taskB的cp2的数据吗?
>
> 还是taskA1和taskB都停止处理*,*等taskA2?
>
> 同样的问题,如果taskA本身就是一个reduce操作(keyby),taskB是一个map操作。那么同样的问题,答案是一样的吗?
>
> 谢谢~
>


如果有些map阶段的计算很慢,它发checkpoint也很慢,那么这样会阻塞reduce operator进行后续的操作吗

2020-02-23 Thread Mark Zang
假设一个简单的map和reduce操作。A是Map Operator,B是keyby Operator。
A有两个task:taskA1和taskA2,B只有一个taskB

如果taskA2执行的特别慢,taskA1执行完毕checkpoint
cp1后,告诉了taskB,然后已经开始(或者说可以开始)处理下一个checkpoint cp2的数据了。

这时候taskA2还在缓慢的处理cp1的数据。这时候:

taskA1会继续处理cp2的数据吗?
如果是继续处理,taskB会处理taskA传递给taskB的cp2的数据吗?

还是taskA1和taskB都停止处理*,*等taskA2?

同样的问题,如果taskA本身就是一个reduce操作(keyby),taskB是一个map操作。那么同样的问题,答案是一样的吗?

谢谢~


CfP: Workshop on Large Scale RDF Analytics (LASCAR-20) at ESWC'20

2020-02-23 Thread Hajira Jabeen

We apologize for cross-postings.
We appreciate your great help in forwarding this CFP to your
colleagues and friends.


Call for Papers & Posters for the 2nd Workshop on Large Scale RDF
Analytics (LASCAR-20), collocated with the Extended Semantic Web
Conference (ESWC) 2020

Venue & Dates:
Heraklion, Crete, Greece, May 31st, 2020

Workshop Website:http://lascar.sda.tech

[Topics of interest]

LASCAR-20 seeks original articles and posters describing theoretical
and practical methods as well as techniques for performing scalable
analytics on knowledge graphs. All papers must be original and not
simultaneously submitted to another journal or conference. The
following paper and poster categories are welcome:

  * Decentralized KG data management including parsing, compression,
partitioning and smart indexing
  * Large scale KG enrichment using link prediction, entity
resolution, entity disambiguation or similarity estimation
  * Machine Learning e.g. clustering, blocking, or anomaly detection
  * Complex analytics with distributed KG embeddings
  * Connecting property Graphs with RDF and reasoning
  * Use-cases presenting semantic technologies at industry scale

[Important Dates]

Electronic submission of full papers:   February 28th, 2020
Notification of paper acceptance:   March 27th, 2020
Camera-ready of accepted papers:April 10th, 2020
Workshop day:   May 31st, 2020

[Submission Guidelines]

All papers should be formatted according to the standard LNCS Style.
All papers will be peer reviewed using the single-blind approach.
Authors of the accepted papers will be asked to register for the
workshop and will have the opportunity to present and participate in
the workshop. Long papers should not be longer than 10 pages including
the references, and short papers should not exceed 6 pages, including
all references. The accepted papers will be published online in CEUR
Workshop Proceedings (CEUR-WS.org). Proceedings will be available for
download after the conference. The pre-print will be made available
during the conference. The authors of the accepted posters will be
invited to present their posters at the workshop.

Please submit your work via EasyChair
(https://www.easychair.org/conferences/?conf=lascar20)

Dr.  Hajira Jabeen
Senior researcher,
SDA, Universität Bonn.

http://sda.cs.uni-bonn.de/people/dr-hajira-jabeen/


Batch reading from Cassandra

2020-02-23 Thread Lasse Nedergaard
Hi.

We would like to do some batch analytics on our data set stored in Cassandra 
and are looking for an efficient way to load data from a single table. Not by 
key, but random 15%, 50% or 100% 
Data bricks has create an efficient way to load Cassandra data into Apache 
Spark and they are doing it by reading from the underlying SS tables to load in 
parallel. 
Do we have something similarly in Flink, or how is the most efficient way to 
load all, or many random data from a single Cassandra table into Flink? 

Any suggestions and/or recommendations is highly appreciated.

Thanks in advance

Lasse Nedergaard