Also you can have each message type in a different topic (needs to be arranged 
upstream from your Spark Streaming app ie in the publishing systems and the 
messaging brokers) and then for each topic you can have a dedicated instance of 
InputReceiverDStream which will be the start of a dedicated DAG pipeline 
instance for every message type. Moreover each such DAG pipeline instance will 
run in parallel with the others 

 

From: Tathagata Das [mailto:t...@databricks.com] 
Sent: Thursday, April 16, 2015 12:52 AM
To: Jianshi Huang
Cc: user; Shao, Saisai; Huang Jie
Subject: Re: How to do dispatching in Streaming?

 

It may be worthwhile to do architect the computation in a different way. 

 

dstream.foreachRDD { rdd => 

   rdd.foreach { record => 

      // do different things for each record based on filters

   }

}

 

TD

 

On Sun, Apr 12, 2015 at 7:52 PM, Jianshi Huang <jianshi.hu...@gmail.com> wrote:

Hi,

 

I have a Kafka topic that contains dozens of different types of messages. And 
for each one I'll need to create a DStream for it.

 

Currently I have to filter the Kafka stream over and over, which is very 
inefficient.

 

So what's the best way to do dispatching in Spark Streaming? (one DStream -> 
multiple DStreams)

 




Thanks,

-- 

Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

 

Reply via email to