Flink has support for spillable intermediate results. Currently they are only set if necessary to avoid pipeline deadlocks.
You can force this via env.getConfig().setExecutionMode(ExecutionMode.BATCH); This will write shuffles to disk, but you don't get the fine-grained control you probably need for your use case. – Ufuk On Thu, May 5, 2016 at 3:29 PM, Paschek, Robert <robert.pasc...@tu-berlin.de> wrote: > Hi Mailing List, > > > > I want to write and read intermediates to/from disk. > > The following foo- codesnippet may illustrate my intention: > > > > public void mapPartition(Iterable<T> tuples, Collector<T> out) { > > > > for (T tuple : tuples) { > > > > if (Condition) > > out.collect(tuple); > > else > > writeTupleToDisk > > } > > > > While (‘TupleOnDisk’) > > out.collect(‘ReadNextTupleFromDisk’); > > } > > > > I'am wondering if flink provides an integrated class for this purpose. I > also have to precise identify the files with the intermediates due > parallelism of mapPartition. > > > > > > Thank you in advance! > > Robert