Yes, the problem is that the Java API inadvertently requires an
Iterable return value, not an Iterator:
https://issues.apache.org/jira/browse/SPARK-3369 I think this can't be
fixed until Spark 2.x.
It seems possible to cheat and return a wrapper like the
"IteratorIterable" I posted in the JIRA. Yo
I number of the problems I want to work with generate datasets which are
too large to hold in memory. This becomes an issue when building a
FlatMapFunction and also when the data used in combineByKey cannot be held
in memory.
The following is a simple, if a little silly, example of a
FlatMapF