Also you can have each message type in a different topic (needs to be arranged upstream from your Spark Streaming app ie in the publishing systems and the messaging brokers) and then for each topic you can have a dedicated instance of InputReceiverDStream which will be the start of a dedicated DAG pipeline instance for every message type. Moreover each such DAG pipeline instance will run in parallel with the others
From: Tathagata Das [mailto:t...@databricks.com] Sent: Thursday, April 16, 2015 12:52 AM To: Jianshi Huang Cc: user; Shao, Saisai; Huang Jie Subject: Re: How to do dispatching in Streaming? It may be worthwhile to do architect the computation in a different way. dstream.foreachRDD { rdd => rdd.foreach { record => // do different things for each record based on filters } } TD On Sun, Apr 12, 2015 at 7:52 PM, Jianshi Huang <jianshi.hu...@gmail.com> wrote: Hi, I have a Kafka topic that contains dozens of different types of messages. And for each one I'll need to create a DStream for it. Currently I have to filter the Kafka stream over and over, which is very inefficient. So what's the best way to do dispatching in Spark Streaming? (one DStream -> multiple DStreams) Thanks, -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/