Do you want to transform the RDD, or just produce some side effect with its contents? If the latter, you want foreachPartition, not mapPartitions.
On Fri, Jun 26, 2015 at 11:52 AM, Wang, Ningjun (LNG-NPV) < [email protected]> wrote: > In rdd.mapPartition(…) if I try to iterate through the items in the > partition, everything screw. For example > > > > *val *rdd = sc.parallelize(1 to 1000, 3) > val count = rdd.mapPartitions(iter => { > > *println(iter.length) *iter > }).count() > > > > > > The count is 0. This is incorrect. The count should be 1000. If I just > comment out the line *println(iter.length)*, then the count become 1000 > correctly. > > > > Does this mean I cannot iterate through iter in mapPartitions? I want to > get all items in a partition and compose one request to send to external > system. How can I achieve that if I am not allowed to iterate through items > in the partition? > > > > Ningjun > > >
