Are you referring to Spark Streaming? Can you save the sum as a RDD & keep joining the two rdd together?
Regards Mayur Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Fri, Mar 28, 2014 at 10:47 AM, Adrian Mocanu <amoc...@verticalscope.com>wrote: > Thanks! > > > > Ya that’s what I’m doing so far, but I wanted to see if it’s possible to > keep the tuples inside Spark for fault tolerance purposes. > > > > -A > > *From:* Mark Hamstra [mailto:m...@clearstorydata.com] > *Sent:* March-28-14 10:45 AM > *To:* user@spark.apache.org > *Subject:* Re: function state lost when next RDD is processed > > > > As long as the amount of state being passed is relatively small, it's > probably easiest to send it back to the driver and to introduce it into RDD > transformations as the zero value of a fold. > > > > On Fri, Mar 28, 2014 at 7:12 AM, Adrian Mocanu <amoc...@verticalscope.com> > wrote: > > I’d like to resurrect this thread since I don’t have an answer yet. > > > > *From:* Adrian Mocanu [mailto:amoc...@verticalscope.com] > *Sent:* March-27-14 10:04 AM > *To:* u...@spark.incubator.apache.org > *Subject:* function state lost when next RDD is processed > > > > Is there a way to pass a custom function to spark to run it on the entire > stream? For example, say I have a function which sums up values in each RDD > and then across RDDs. > > > > I’ve tried with map, transform, reduce. They all apply my sum function on > 1 RDD. When the next RDD comes the function starts from 0 so the sum of the > previous RDD is lost. > > > > Does Spark support a way of passing a custom function so that its state is > preserved across RDDs and not only within RDD? > > > > Thanks > > -Adrian > > > > >