Re: chained operator with different parallelism question

2018-05-18 Thread Fabian Hueske
Functions with different parallelism cannot be chained. Chaining means that Functions are fused into a single operator and that records are passed by method calls (instead of serializing them into an in-memory or network channel). Hence, chaining does not work if you have different parallelism.

chained operator with different parallelism question

2018-05-18 Thread makeyang
someStream.filter(...).map(...).map(...); there operators are supposed to chained. but what if there are set different parallelism like below: someStream.filter(...).setParallelism(X).map(...).setParallelism(Y).map(...).setParallelism(Z); X != Y != Z what will happen? -- Sent from:

Re: Parallelism question

2015-04-16 Thread Fabian Hueske
Hi Giacomo, as Max said, you can sort the data within a partition. However, data across partitions is not sorted. It is either random partitioned or hash-partitioned (all records that share some property are in the same partition). Producing fully ordered output, where the first partition has

Parallelism question

2015-04-14 Thread Giacomo Licari
Hi guys, I have a question about how parallelism works. If I have a large dataset and I would divide it into 5 blocks, can I pass each block of data to a fixed parallel process (for example I set up 5 process) ? And if the results data from each process arrive to the output not in an ordered

Re: Parallelism question

2015-04-14 Thread Maximilian Michels
Hi Giacomo, If I understand you correctly, you want your Flink job to execute with a parallelism of 5. Just call setDegreeOfParallelism(5) on your ExecutionEnvironment. That way, all operations, when possible, will be performed using 5 parallel instances. This is also true for the DataSink which

Re: Parallelism question

2015-04-14 Thread Giacomo Licari
Hi Max, thank you for your reply. DataSink contains data ordered, I mean, it contains in order output1, output1 ... output5? Or are them mixed? Thanks a lot, Giacomo On Tue, Apr 14, 2015 at 11:58 AM, Maximilian Michels m...@apache.org wrote: Hi Giacomo, If I understand you correctly, you

Re: Parallelism question

2015-04-14 Thread Maximilian Michels
Hi Giacomo, If you use a FileOutputFormat as a DataSink (e.g. as in env.writeAsText(/path), then the output directory will contain 5 files named 1, 2, 3, 4, and 5, each containing the output of the corresponding task. The order of the data in the files follows the order of the distributed