Hi all, I am wondering whether there is way to ensure that two consecutive maps inside mapPartition will not be chained together.
To illustrate my question I prepared short example: rdd.mapPartitions(it => { it.map(x => foo(x)).map(y => y.getResult) } I would like to ensure that foo method will be applied to all records (from partition) and only after that method getResult invoked on each record. It could be beneficial in situation that foo method is some kind of time consuming IO operation i.e. request to external service for data (data that couldn't be prefetched). I know that converting iterator into list will do the job but maybe there is more clever way for doing it. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Ensuring-eager-evaluation-inside-mapPartitions-tp25085.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org