Hi Flavio, Doing batch += ... shouldn't work. It will create new batch for each element in the myRDD (also val initializes an immutable variable, var is for mutable variables). You can use something like accumulators <http://spark.apache.org/docs/latest/programming-guide.html#accumulators>.
val accum = sc.accumulator(0, "Some accumulator") val NUM = 100 val SIZE = myRDD.count() val LAST = SIZE % NUM val MULTIPLE = (SIZE / NUM) * NUM // should return 300 when SIZE is 350 myRDD.map(x => { accum++ if (accum++ < MULTIPLE) null else x }). So if there are 350 elements, it should return null for first 300 elements and actual values for last 50 elements. Then we can apply a filter for null and get remainder elements (and then finally flush the new RDD containing the remainder elements). On map vs mapPartitions, mapPartitions is used mainly for efficiency (as you can see here <http://spark-summit.org/wp-content/uploads/2013/10/Wendell-Spark-Performance.pdf>, here <http://stackoverflow.com/questions/21185092/apache-spark-map-vs-mappartitions> and here <http://bzhangusc.wordpress.com/2014/06/19/optimize-map-performamce-with-mappartitions/>). So for simpler code, you can go with map, and for efficiency, you can go with mapPartitions. Regards, Kamal