Hi,

Broadcasting will brake an operator chain. However my best guess is that Kafka 
source will be still a performance bottleneck in your job. Also Network 
exchanges add some measurable overhead only if your records are very 
lightweight and easy to process (for example if you are using RocksDB then you 
can just ignore network costs).

Either way, you can just try this out. Pre populate your Kafka topic with some 
significant number of messages, run both jobs, compare the throughput and 
decide based on those results wether this is ok for you or not.

Piotrek 

> On 6 Aug 2019, at 09:56, 黄兆鹏 <[email protected]> wrote:
> 
> Hi all, 
> My flink job has dynamic schema of data, so I want to consume a schema kafka 
> topic and try to broadcast to every operator so that each operator could know 
> what kind of data it is handling.
> 
> For example, the two streams just like this:
> OperatorA  ->  OperatorB  -> OperatorC
>       ^                   ^                      ^
>       |                    |                       |
>                BroadcastStream
> 
> If the broadcast stream does not exist, OperatorA, OperatorB, OperatorC are 
> chained together in one slot because they have the same parallelism so that 
> it can gain maximum performance.
> 
> And I was wondering that if the broadcast stream exists, will it affect the 
> performance? Or flink will still chain them together to gain maximum 
> performance? 
> 
> Thanks!

Reply via email to