Re: Spark and Spark Streaming code sharing best practice.

Jean-Pascal Billaud Wed, 18 Feb 2015 17:27:13 -0800

Thanks Arush. I will check that out.

On Wed, Feb 18, 2015 at 11:06 AM, Arush Kharbanda <
ar...@sigmoidanalytics.com> wrote:


> I find monoids pretty useful in this respect, basically separating out the
> logic in a monoid and then applying the logic to either a stream or a
> batch. A list of such practices could be really useful.
>
> On Thu, Feb 19, 2015 at 12:26 AM, Jean-Pascal Billaud <j...@tellapart.com>
> wrote:
>
>> Hey,
>>
>> It seems pretty clear that one of the strength of Spark is to be able to
>> share your code between your batch and streaming layer. Though, given that
>> Spark streaming uses DStream being a set of RDDs and Spark uses a single
>> RDD there might some complexity associated with it.
>>
>> Of course since DStream is a superset of RDDs, one can just run the same
>> code at the RDD granularity using DStream::forEachRDD. While this should
>> work for map, I am not sure how that can work when it comes to reduce phase
>> given that a group of keys spans across multiple RDDs.
>>
>> One of the option is to change the dataset object on which a job works
>> on. For example of passing an RDD to a class method, one passes a higher
>> level object (MetaRDD) that wraps around RDD or DStream depending the
>> context. At this point the job calls its regular maps, reduces and so on
>> and the MetaRDD wrapper would delegate accordingly.
>>
>> Just would like to know the official best practice from the spark
>> community though.
>>
>> Thanks,
>>
>
>
>
> --
>
> [image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>
>
> *Arush Kharbanda* || Technical Teamlead
>
> ar...@sigmoidanalytics.com || www.sigmoidanalytics.com
>

Re: Spark and Spark Streaming code sharing best practice.

Reply via email to