Correlation between data streams/operators and threads

Shailesh Jain Thu, 09 Nov 2017 01:18:58 -0800

Hi,

I'm trying to understand the runtime aspect of Flink when dealing with
multiple data streams and multiple operators per data stream.


Use case: N data streams in a single flink job (each data stream
representing 1 device - with different time latencies), and each of these
data streams gets split into two streams, of which one goes into a bunch of
CEP operators, and one into a process function.

Questions:
1. At runtime, will the engine create one thread per data stream? Or one
thread per operator?
2. Is it possible to dynamically create a data stream at runtime when the
job starts? (i.e. if N is read from a file when the job starts and
corresponding N streams need to be created)
3. Are there any specific performance impacts when a large number of
streams (N ~ 10000) are created, as opposed to N partitions within a single
stream?

Are there any internal (design) documents which can help understanding the
implementation details? Any references to the source will also be really
helpful.

Thanks in advance.

Shailesh

Correlation between data streams/operators and threads

Reply via email to