Hi all,

I am wondering whether there is way to ensure that two consecutive maps
inside mapPartition will not be chained together. 

To illustrate my question I prepared short example:

rdd.mapPartitions(it => {
    it.map(x => foo(x)).map(y => y.getResult)
}

I would like to ensure that foo method will be applied to all records (from
partition) and only after that method getResult invoked on each record. It
could be beneficial in situation that foo method is some kind of time
consuming IO operation i.e. request to external service for data (data that
couldn't be prefetched).

I know that converting iterator into list will do the job but maybe there is
more clever way for doing it.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Ensuring-eager-evaluation-inside-mapPartitions-tp25085.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to