subject:"\"What can be done if a FlatMapFunctions generated more data that can be held in memory\""

Re: What can be done if a FlatMapFunctions generated more data that can be held in memory

2014-10-02 Thread Sean Owen

Yes, the problem is that the Java API inadvertently requires an Iterable return value, not an Iterator: https://issues.apache.org/jira/browse/SPARK-3369 I think this can't be fixed until Spark 2.x. It seems possible to cheat and return a wrapper like the "IteratorIterable" I posted in the JIRA. Yo

What can be done if a FlatMapFunctions generated more data that can be held in memory

2014-10-01 Thread Steve Lewis

I number of the problems I want to work with generate datasets which are too large to hold in memory. This becomes an issue when building a FlatMapFunction and also when the data used in combineByKey cannot be held in memory. The following is a simple, if a little silly, example of a FlatMapF