In my use case my source stream contain small size messages, but as part of
flink processing I will be aggregating them into large messages and further
processing will happen on these large messages. The structure of this large
message will be something like this:
Class LargeMessage {
String key
List <String> messages; // this is where the aggregation of smaller
messages happen
}
In some cases this list field of LargeMessage can get very large (1000’s of
messages). Is it ok to create an intermediate stream of these LargeMessages?
What should I be concerned about while designing the flink job? Specifically
with parallelism in mind. As these LargeMessages flow from one flink subtask to
another, do they get serialized/deserialized ?
Thanks.