Re: function state lost when next RDD is processed

Mayur Rustagi Fri, 28 Mar 2014 20:45:53 -0700

Are you referring to Spark Streaming?

Can you save the sum as a RDD & keep joining the two rdd together?


Regards
Mayur

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Fri, Mar 28, 2014 at 10:47 AM, Adrian Mocanu
<amoc...@verticalscope.com>wrote:

>  Thanks!
>
>
>
> Ya that’s what I’m doing so far, but I wanted to see if it’s possible to
> keep the tuples inside Spark for fault tolerance purposes.
>
>
>
> -A
>
> *From:* Mark Hamstra [mailto:m...@clearstorydata.com]
> *Sent:* March-28-14 10:45 AM
> *To:* user@spark.apache.org
> *Subject:* Re: function state lost when next RDD is processed
>
>
>
> As long as the amount of state being passed is relatively small, it's
> probably easiest to send it back to the driver and to introduce it into RDD
> transformations as the zero value of a fold.
>
>
>
> On Fri, Mar 28, 2014 at 7:12 AM, Adrian Mocanu <amoc...@verticalscope.com>
> wrote:
>
>  I’d like to resurrect this thread since I don’t have an answer yet.
>
>
>
> *From:* Adrian Mocanu [mailto:amoc...@verticalscope.com]
> *Sent:* March-27-14 10:04 AM
> *To:* u...@spark.incubator.apache.org
> *Subject:* function state lost when next RDD is processed
>
>
>
> Is there a way to pass a custom function to spark to run it on the entire
> stream? For example, say I have a function which sums up values in each RDD
> and then across RDDs.
>
>
>
> I’ve tried with map, transform, reduce. They all apply my sum function on
> 1 RDD. When the next RDD comes the function starts from 0 so the sum of the
> previous RDD is lost.
>
>
>
> Does Spark support a way of passing a custom function so that its state is
> preserved across RDDs and not only within RDD?
>
>
>
> Thanks
>
> -Adrian
>
>
>
>
>

Re: function state lost when next RDD is processed

Reply via email to