Hi Kostas, thanks for the quick reply.
> If T_1 must be processed before T_i, i>1, then you cannot parallelize the > algorithm. What would be the best way to process it anyway? DataSet.collect() -> loop over List -> env.fromCollection(...) ? Or with a parallelism of 1 and a .map(...) ? However, this approach would collect all data at one node and wouldn't scale, correct? Regards, Sebastian