The Broadcast State <https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/state/broadcast_state.html#the-broadcast-state-pattern> may be interesting to you.

On 08.02.2019 15:57, Aggarwal, Ajay wrote:

Yes, another KeyBy will be used. The “small size” messages will be strings of length 500 to 1000.

Is there a concept of “global” state in flink? Is it possible to keep these lists in global state and only pass the list reference (by name?) in the LargeMessage?

*From: *Chesnay Schepler <[email protected]>
*Date: *Friday, February 8, 2019 at 8:45 AM
*To: *"Aggarwal, Ajay" <[email protected]>, "[email protected]" <[email protected]>
*Subject: *Re: stream of large objects

Whether a LargeMessage is serialized depends on how the job is structured.
For example, if you were to only apply map/filter functions after the aggregation it is likely they wouldn't be serialized.
If you were to apply another keyBy they will be serialized again.

When you say "small size" messages, what are we talking about here?

On 07.02.2019 20:37, Aggarwal, Ajay wrote:

    In my use case my source stream contain small size messages, but
    as part of flink processing I will be aggregating them into large
    messages and further processing will happen on these large
    messages. The structure of this large message will be something
    like this:

       Class LargeMessage {

          String key

           List <String> messages; // this is where the aggregation of
    smaller messages happen

       }

    In some cases this list field of LargeMessage can get very large
    (1000’s of messages). Is it ok to create an intermediate stream of
    these LargeMessages? What should I be concerned about while
    designing the flink job? Specifically with parallelism in mind. As
    these LargeMessages flow from one flink subtask to another, do
    they get serialized/deserialized ?

    Thanks.


Reply via email to