Re: rdd.foreach doesn't act as expected

Hao REN Wed, 06 Nov 2013 14:42:51 -0800

'map' works as expected. The immutable object here is just based on the use
case that the data need to be updated everyday.
Wondering what the best way to do that. Not sure that spark supports
updating well.



2013/11/6 Mohit Jaggi <[email protected]>

> my guess is you need to use a map for this. foreach is for side-effects
> and i am not sure if changing the object itself is an expected use. also,
> the objects are supposed to be immutable, your's isn't.
>
>
> On Tue, Nov 5, 2013 at 4:40 PM, Hao REN <[email protected]> wrote:
>
>> Hi,
>>
>> Just a quick question:
>>
>> When playing Spark with my toy code as below, I get some unexpected
>> results.
>>
>>
>> *case class A(var a: Int) {*
>> *    def setA() = { a = 100 }*
>> *}*
>>
>> *val as = sc.parallelize(List(A(1), A(2)))   // it is a RDD[A]*
>>
>>
>> *as.foreach(_.setA())*
>>
>> *as.collect  // it gives Array[this.A] = Array(A(1), A(2))*
>>
>>
>> The result expected is Array(A(100), A(100)). I am just trying to update
>> the content of the objects of A which reside in RDD.
>>
>> 1) Does the foreach do the right thing ?
>> 2) Which is the best way to update the object in RDD, use 'map' instead ?
>>
>> Thank you.
>>
>> Hao
>>
>> --
>>  REN Hao
>>
>> Data Engineer @ ClaraVista
>>
>> Paris, France
>>
>> Tel:  +33 06 14 54 57 24
>>
>
>


-- 
REN Hao

Data Engineer @ ClaraVista

Paris, France

Tel:  +33 06 14 54 57 24

Re: rdd.foreach doesn't act as expected

Reply via email to