Re: function state lost when next RDD is processed

Mark Hamstra Fri, 28 Mar 2014 10:51:48 -0700

As long as the amount of state being passed is relatively small, it's
probably easiest to send it back to the driver and to introduce it into RDD
transformations as the zero value of a fold.



On Fri, Mar 28, 2014 at 7:12 AM, Adrian Mocanu <amoc...@verticalscope.com>wrote:

>  I'd like to resurrect this thread since I don't have an answer yet.
>
>
>
> *From:* Adrian Mocanu [mailto:amoc...@verticalscope.com]
> *Sent:* March-27-14 10:04 AM
> *To:* u...@spark.incubator.apache.org
> *Subject:* function state lost when next RDD is processed
>
>
>
> Is there a way to pass a custom function to spark to run it on the entire
> stream? For example, say I have a function which sums up values in each RDD
> and then across RDDs.
>
>
>
> I've tried with map, transform, reduce. They all apply my sum function on
> 1 RDD. When the next RDD comes the function starts from 0 so the sum of the
> previous RDD is lost.
>
>
>
> Does Spark support a way of passing a custom function so that its state is
> preserved across RDDs and not only within RDD?
>
>
>
> Thanks
>
> -Adrian
>
>
>

Re: function state lost when next RDD is processed

Reply via email to