Hi!
Flink has a non-batch exchange way to break pipelines, which is by now
quite custom for iterations. It is used there for constructs that fork and
re-join the flow.
The proper batch-exchange is better, because the scheduler can exploit
that, but is is not yet usable in iterations.
Stephan
Hi Fabian,
thanks for your explanation!
Yeah, I figured that if an easy fix exists, you would have done that yourself.
This is more for me to understand the conceptual problem.
But back to the pipeline-requirement: Doesn't zipWithIndex violate that too,
then? It's also a mapPartitions, collec
Hi Fridtjof,
the range partitioner works by building a histogram for the partitioning
key. This requires a pass over the whole intermediate data set which means
it needs to be materialized and cannot be processed in a pipelined fashion.
However, pipelined data exchange strategies are a requirement