Thanks Arush. I will check that out. On Wed, Feb 18, 2015 at 11:06 AM, Arush Kharbanda < ar...@sigmoidanalytics.com> wrote:
> I find monoids pretty useful in this respect, basically separating out the > logic in a monoid and then applying the logic to either a stream or a > batch. A list of such practices could be really useful. > > On Thu, Feb 19, 2015 at 12:26 AM, Jean-Pascal Billaud <j...@tellapart.com> > wrote: > >> Hey, >> >> It seems pretty clear that one of the strength of Spark is to be able to >> share your code between your batch and streaming layer. Though, given that >> Spark streaming uses DStream being a set of RDDs and Spark uses a single >> RDD there might some complexity associated with it. >> >> Of course since DStream is a superset of RDDs, one can just run the same >> code at the RDD granularity using DStream::forEachRDD. While this should >> work for map, I am not sure how that can work when it comes to reduce phase >> given that a group of keys spans across multiple RDDs. >> >> One of the option is to change the dataset object on which a job works >> on. For example of passing an RDD to a class method, one passes a higher >> level object (MetaRDD) that wraps around RDD or DStream depending the >> context. At this point the job calls its regular maps, reduces and so on >> and the MetaRDD wrapper would delegate accordingly. >> >> Just would like to know the official best practice from the spark >> community though. >> >> Thanks, >> > > > > -- > > [image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com> > > *Arush Kharbanda* || Technical Teamlead > > ar...@sigmoidanalytics.com || www.sigmoidanalytics.com >