Re: Multiple (non-consecutive) keyBy operators in a dataflow

李玥 Mon, 02 Apr 2018 18:22:34 -0700

Hello,
        In my opinion , it would be meaningful only on this situation:
1. The total size of all your stats is huge enough, e.g. 1GB+.
2. Splitting  you job to multiple KeyBy process would reduce the size of your 
stats.


Because operation of saving stats is synchronized and all working threads are 
blocked until the saving stats operation finished.
Our team is trying to make the process of saving stats async, plz refer to : 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Slow-flink-checkpoint-td18946.html
 
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Slow-flink-checkpoint-td18946.html>

LiYue
http://tig.jd.com
liyue2...@gmail.com



> 在 2018年4月3日，上午8:30，au.fp2018 <au.fp2...@gmail.com> 写道：
> 
> Hello Flink Community,
> 
> I am relatively new to Flink. In the project I am currently working on I've
> a dataflow with a keyBy() operator, which I want to convert to dataflow with
> multiple keyBy() operators like this:
> 
> 
>  Source -->
>  KeyBy() -->
>  Stateful process() function that generates a more granular key -->
>  KeyBy(<id generated in the previous step>) -->
>  More stateful computation(s) -->
>  Sink
> 
> Are there any downsides to this approach?
> My reasoning behind the second keyBy() is to reduce the amount of state and
> hence improve the processing speed.
> 
> Thanks,
> Andre
> 
> 
> 
> 
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Multiple (non-consecutive) keyBy operators in a dataflow

Reply via email to