Re: Kafka Direct Stream

varun sharma Fri, 02 Oct 2015 11:20:38 -0700

Hi Nicolae,

Won't creating N KafkaDirectStreams be an overhead for my streaming job
compared to Single DirectStream?


On Fri, Oct 2, 2015 at 1:13 AM, Nicolae Marasoiu <
nicolae.maras...@adswizz.com> wrote:

> Hi,
>
>
> If you just need processing per topic, why not generate N different kafka
> direct streams ? when creating a kafka direct stream you have list of
> topics - just give one.
>
>
> Then the reusable part of your computations should be extractable as
> transformations/functions and reused between the streams.
>
>
> Nicu
>
>
>
> ------------------------------
> *From:* Adrian Tanase <atan...@adobe.com>
> *Sent:* Thursday, October 1, 2015 5:47 PM
> *To:* Cody Koeninger; Udit Mehta
> *Cc:* user
> *Subject:* Re: Kafka Direct Stream
>
> On top of that you could make the topic part of the key (e.g. keyBy in
> .transform or manually emitting a tuple) and use one of the .xxxByKey
> operators for the processing.
>
> If you have a stable, domain specific list of topics (e.g. 3-5 named
> topics) and the processing is *really* different, I would also look at
> filtering by topic and saving as different Dstreams in your code.
>
> Either way you need to start with Cody’s tip in order to extract the topic
> name.
>
> -adrian
>
> From: Cody Koeninger
> Date: Thursday, October 1, 2015 at 5:06 PM
> To: Udit Mehta
> Cc: user
> Subject: Re: Kafka Direct Stream
>
> You can get the topic for a given partition from the offset range.  You
> can either filter using that; or just have a single rdd and match on topic
> when doing mapPartitions or foreachPartition (which I think is a better
> idea)
>
>
> http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers
>
> <http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers>
> Spark Streaming + Kafka Integration Guide - Spark 1.5.0 ...
> Spark Streaming + Kafka Integration Guide. Apache Kafka is
> publish-subscribe messaging rethought as a distributed, partitioned,
> replicated commit log service.
> Read more...
> <http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers>
>
>
>
> On Wed, Sep 30, 2015 at 5:02 PM, Udit Mehta <ume...@groupon.com> wrote:
>
>> Hi,
>>
>> I am using spark direct stream to consume from multiple topics in Kafka.
>> I am able to consume fine but I am stuck at how to separate the data for
>> each topic since I need to process data differently depending on the topic.
>> I basically want to split the RDD consisting on N topics into N RDD's
>> each having 1 topic.
>>
>> Any help would be appreciated.
>>
>> Thanks in advance,
>> Udit
>>
>
>


-- 
*VARUN SHARMA*
*Flipkart*
*Bangalore*

Re: Kafka Direct Stream

Reply via email to