Re: Is it possible to just change the value of the items in RDD without making a full copy?

2014-12-02 Thread Sean Owen
Although it feels like you are copying an RDD when you map it, it is not necessarily literally being copied. Your map function may pass through most objects unchanged. So there may not be so much overhead as you think. I don't think you can avoid a scan of the data unless you can somehow know that

Re: Is it possible to just change the value of the items in RDD without making a full copy?

2014-12-02 Thread Yanbo Liang
You can not modify one RDD in mapPartitions due to RDD is immutable. Once you apply transform function on RDDs, they will produce new RDDs. If you just want to modify only a fraction of the total RDD, try to collect the new value list to driver or use broadcast variable after each iteration, not to

Re: Is it possible to just change the value of the items in RDD without making a full copy?

2014-12-02 Thread Akhil Das
RDDs are immutable, so if you want to change the value of an RDD then you have to create another RDD from it by applying some transformation. Not sure if this is what you are looking for: val rdd = sc.parallelize(Range(0,100)) val rdd2 = rdd.map(x => { println("Value : " +

Is it possible to just change the value of the items in RDD without making a full copy?

2014-12-02 Thread Xuelin Cao
Hi,       I'd like to make an operation on an RDD that ONLY change the value of   some items, without make a full copy or full scan of each data.      It is useful when I need to handle a large RDD, and each time I need only to change a little fraction of the data, and keeps other data unchanged.