Functions with different parallelism cannot be chained.
Chaining means that Functions are fused into a single operator and that
records are passed by method calls (instead of serializing them into an
in-memory or network channel).
Hence, chaining does not work if you have different parallelism.
someStream.filter(...).map(...).map(...);
there operators are supposed to chained.
but what if there are set different parallelism like below:
someStream.filter(...).setParallelism(X).map(...).setParallelism(Y).map(...).setParallelism(Z);
X != Y != Z
what will happen?
--
Sent from:
Hi Giacomo,
as Max said, you can sort the data within a partition.
However, data across partitions is not sorted. It is either random
partitioned or hash-partitioned (all records that share some property are
in the same partition). Producing fully ordered output, where the first
partition has
Hi guys,
I have a question about how parallelism works.
If I have a large dataset and I would divide it into 5 blocks, can I pass
each block of data to a fixed parallel process (for example I set up 5
process) ?
And if the results data from each process arrive to the output not in an
ordered
Hi Giacomo,
If I understand you correctly, you want your Flink job to execute with a
parallelism of 5. Just call setDegreeOfParallelism(5) on your
ExecutionEnvironment. That way, all operations, when possible, will be
performed using 5 parallel instances. This is also true for the DataSink
which
Hi Max,
thank you for your reply.
DataSink contains data ordered, I mean, it contains in order output1,
output1 ... output5? Or are them mixed?
Thanks a lot,
Giacomo
On Tue, Apr 14, 2015 at 11:58 AM, Maximilian Michels m...@apache.org wrote:
Hi Giacomo,
If I understand you correctly, you
Hi Giacomo,
If you use a FileOutputFormat as a DataSink (e.g. as in
env.writeAsText(/path), then the output directory will contain 5 files
named 1, 2, 3, 4, and 5, each containing the output of the corresponding
task. The order of the data in the files follows the order of the
distributed