subject:"Parallelism question"

Re: chained operator with different parallelism question

2018-05-18 Thread Fabian Hueske

Functions with different parallelism cannot be chained. Chaining means that Functions are fused into a single operator and that records are passed by method calls (instead of serializing them into an in-memory or network channel). Hence, chaining does not work if you have different parallelism.

chained operator with different parallelism question

2018-05-18 Thread makeyang

someStream.filter(...).map(...).map(...); there operators are supposed to chained. but what if there are set different parallelism like below: someStream.filter(...).setParallelism(X).map(...).setParallelism(Y).map(...).setParallelism(Z); X != Y != Z what will happen? -- Sent from:

Re: Parallelism question

2015-04-16 Thread Fabian Hueske

Hi Giacomo, as Max said, you can sort the data within a partition. However, data across partitions is not sorted. It is either random partitioned or hash-partitioned (all records that share some property are in the same partition). Producing fully ordered output, where the first partition has

Parallelism question

2015-04-14 Thread Giacomo Licari

Hi guys, I have a question about how parallelism works. If I have a large dataset and I would divide it into 5 blocks, can I pass each block of data to a fixed parallel process (for example I set up 5 process) ? And if the results data from each process arrive to the output not in an ordered

Re: Parallelism question

2015-04-14 Thread Maximilian Michels

Hi Giacomo, If I understand you correctly, you want your Flink job to execute with a parallelism of 5. Just call setDegreeOfParallelism(5) on your ExecutionEnvironment. That way, all operations, when possible, will be performed using 5 parallel instances. This is also true for the DataSink which

Re: Parallelism question

2015-04-14 Thread Giacomo Licari

Hi Max, thank you for your reply. DataSink contains data ordered, I mean, it contains in order output1, output1 ... output5? Or are them mixed? Thanks a lot, Giacomo On Tue, Apr 14, 2015 at 11:58 AM, Maximilian Michels m...@apache.org wrote: Hi Giacomo, If I understand you correctly, you

Re: Parallelism question

2015-04-14 Thread Maximilian Michels

Hi Giacomo, If you use a FileOutputFormat as a DataSink (e.g. as in env.writeAsText(/path), then the output directory will contain 5 files named 1, 2, 3, 4, and 5, each containing the output of the corresponding task. The order of the data in the files follows the order of the distributed

Re: chained operator with different parallelism question

chained operator with different parallelism question

Re: Parallelism question

Parallelism question

Re: Parallelism question

Re: Parallelism question

Re: Parallelism question

7 matches

Site Navigation

Mail list logo

Footer information