Hi all, I am using mappartitions to do some heavy computing on subsets of the data. I have a dataset with about 1m rows, running on a 32 core cluster. Unfortunately, is seems that mappartitions splits the data into two sets so it is only running on two cores.
Is there a way to force it to split into smaller chunks? thanks, -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/mappartitions-data-size-tp15231.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org