Hi Nicolae, Won't creating N KafkaDirectStreams be an overhead for my streaming job compared to Single DirectStream?
On Fri, Oct 2, 2015 at 1:13 AM, Nicolae Marasoiu < nicolae.maras...@adswizz.com> wrote: > Hi, > > > If you just need processing per topic, why not generate N different kafka > direct streams ? when creating a kafka direct stream you have list of > topics - just give one. > > > Then the reusable part of your computations should be extractable as > transformations/functions and reused between the streams. > > > Nicu > > > > ------------------------------ > *From:* Adrian Tanase <atan...@adobe.com> > *Sent:* Thursday, October 1, 2015 5:47 PM > *To:* Cody Koeninger; Udit Mehta > *Cc:* user > *Subject:* Re: Kafka Direct Stream > > On top of that you could make the topic part of the key (e.g. keyBy in > .transform or manually emitting a tuple) and use one of the .xxxByKey > operators for the processing. > > If you have a stable, domain specific list of topics (e.g. 3-5 named > topics) and the processing is *really* different, I would also look at > filtering by topic and saving as different Dstreams in your code. > > Either way you need to start with Cody’s tip in order to extract the topic > name. > > -adrian > > From: Cody Koeninger > Date: Thursday, October 1, 2015 at 5:06 PM > To: Udit Mehta > Cc: user > Subject: Re: Kafka Direct Stream > > You can get the topic for a given partition from the offset range. You > can either filter using that; or just have a single rdd and match on topic > when doing mapPartitions or foreachPartition (which I think is a better > idea) > > > http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers > > <http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers> > Spark Streaming + Kafka Integration Guide - Spark 1.5.0 ... > Spark Streaming + Kafka Integration Guide. Apache Kafka is > publish-subscribe messaging rethought as a distributed, partitioned, > replicated commit log service. > Read more... > <http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers> > > > > On Wed, Sep 30, 2015 at 5:02 PM, Udit Mehta <ume...@groupon.com> wrote: > >> Hi, >> >> I am using spark direct stream to consume from multiple topics in Kafka. >> I am able to consume fine but I am stuck at how to separate the data for >> each topic since I need to process data differently depending on the topic. >> I basically want to split the RDD consisting on N topics into N RDD's >> each having 1 topic. >> >> Any help would be appreciated. >> >> Thanks in advance, >> Udit >> > > -- *VARUN SHARMA* *Flipkart* *Bangalore*