Use RDD.repartition (see here: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD ).
On Fri, Sep 26, 2014 at 10:19 AM, jamborta <jambo...@gmail.com> wrote: > Hi all, > > I am using mappartitions to do some heavy computing on subsets of the data. > I have a dataset with about 1m rows, running on a 32 core cluster. > Unfortunately, is seems that mappartitions splits the data into two sets so > it is only running on two cores. > > Is there a way to force it to split into smaller chunks? > > thanks, > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/mappartitions-data-size-tp15231.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Daniel Siegmann, Software Developer Velos Accelerating Machine Learning 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001 E: daniel.siegm...@velos.io W: www.velos.io