Hi,
I'd like to make an operation on an RDD that ONLY change the value of
some items, without make a full copy or full scan of each data.
It is useful when I need to handle a large RDD, and each time I need only
to change a little fraction of the data, and keeps other data unchanged.
Certainly I don't want to make a full copy the data to the new RDD.
For example, suppose I have a RDD that contains integer data from 0 to
100. What I want is to make the first element of the RDD changed from 0 to 1,
other elements untouched.
I tried this, but it doesn't work:
var rdd = parallelize(Range(0,100)) rdd.mapPartitions({iter=> iter(0)
= 1}) The reported error is : value update is not a member of
Iterator[Int]
Anyone knows how to make it work?