Re: is it okay to reuse objects across RDD's?

Sung Hwan Chung Mon, 28 Apr 2014 01:32:04 -0700

Actually, I do not know how to do something like this or whether this is
possible - thus my suggestive statement.


Can you already declare persistent memory objects per worker? I tried
something like constructing a singleton object within map functions, but
that didn't work as it seemed to actually serialize singletons and pass it
back and forth in a weird manner.


On Mon, Apr 28, 2014 at 1:23 AM, Sean Owen <so...@cloudera.com> wrote:

> On Mon, Apr 28, 2014 at 8:22 AM, Sung Hwan Chung <coded...@cs.stanford.edu
> > wrote:
>>
>> e.g. something like
>>
>> rdd.mapPartition((rows : Iterator[String]) => {
>>   var idx = 0
>>   rows.map((row: String) => {
>>     val valueMap = SparkWorker.getMemoryContent("valMap")
>>     val prevVal = valueMap(idx)
>>     idx += 1
>>     ...
>>   })
>>   ...
>> })
>>
>> The developer can implement their own fault recovery mechanism if the
>> worker has crashed and lost the memory content.
>>
>
> Yea you can always just declare your own per-partition data structures in
> a function block like that, right? valueMap can be initialized to an empty
> map, loaded from somewhere, or even a value that is broadcast from the
> driver.
>
> That's certainly better than tacking data onto RDDs.
>
> It's not restored if the computation is lost of course, but in this and
> many other cases, it's fine, as it is just for some cached intermediate
> results.
>
> This already works then or did I misunderstand the original use case?
>

Re: is it okay to reuse objects across RDD's?

Reply via email to