I agree Jorn and Satish. I think I should starting grouping similar kind of messages into single topic with some kind of id attached to it which can be pulled from spark streaming application.
I can try reducing no of topic to significant lower but still at the end I can expect 50+ topics in cluster. Do you think creating parallel Dstream will help here ? Refer below link. streaming-programming-guide.html <https://spark.apache.org/docs/latest/streaming-programming-guide.html#level-of-parallelism-in-data-receiving> Thanks Shashi On Wed, Jun 14, 2017 at 8:12 AM, satish lalam <[email protected]> wrote: > Agree with Jörn. Dynamically creating/deleting Topics is nontrivial to > manage. > With the limited knowledge about your scenario - it appears that you are > using topics as some kind of message type enum. > If that is the case - you might be better off with one (or just a few > topics) and have a messagetype field in kafka event itself. > Your streaming job can then match-case incoming events on this field to > choose the right processor for respective events. > > On Tue, Jun 13, 2017 at 1:47 PM, Jörn Franke <[email protected]> wrote: > >> I do not fully understand the design here. >> Why not send all to one topic with some application id in the message and >> you write to one topic also indicating the application id. >> >> Can you elaborate a little bit more on the use case? >> >> Especially applications deleting/creating topics dynamically can be a >> nightmare to operate >> >> > On 13. Jun 2017, at 22:03, Shashi Vishwakarma <[email protected]> >> wrote: >> > >> > Hi >> > >> > I have to design a spark streaming application with below use case. I >> am looking for best possible approach for this. >> > >> > I have application which pushing data into 1000+ different topics each >> has different purpose . Spark streaming will receive data from each topic >> and after processing it will write back to corresponding another topic. >> > >> > Ex. >> > >> > Input Type 1 Topic --> Spark Streaming --> Output Type 1 Topic >> > Input Type 2 Topic --> Spark Streaming --> Output Type 2 Topic >> > Input Type 3 Topic --> Spark Streaming --> Output Type 3 Topic >> > . >> > . >> > . >> > Input Type N Topic --> Spark Streaming --> Output Type N Topic and so >> on. >> > >> > I need to answer following questions. >> > >> > 1. Is it a good idea to launch 1000+ spark streaming application per >> topic basis ? Or I should have one streaming application for all topics as >> processing logic going to be same ? >> > 2. If one streaming context , then how will I determine which RDD >> belongs to which Kafka topic , so that after processing I can write it back >> to its corresponding OUTPUT Topic? >> > 3. Client may add/delete topic from Kafka , how do dynamically handle >> in Spark streaming ? >> > 4. How do I restart job automatically on failure ? >> > >> > Any other issue you guys see here ? >> > >> > Highly appreicate your response. >> > >> > Thanks >> > Shashi >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: [email protected] >> >> >
