Re: Spark Streaming Design Suggestion

Shashi Vishwakarma Wed, 14 Jun 2017 13:48:17 -0700

I agree Jorn and Satish. I think I should starting grouping similar kind of
messages into single topic with some kind of id attached to it which can be
pulled from spark streaming application.


I can try reducing no of topic to significant lower but still at the end I
can expect 50+ topics in cluster. Do you think creating parallel Dstream
will help here ?

Refer below link.

streaming-programming-guide.html
<https://spark.apache.org/docs/latest/streaming-programming-guide.html#level-of-parallelism-in-data-receiving>

Thanks
Shashi


On Wed, Jun 14, 2017 at 8:12 AM, satish lalam <[email protected]>
wrote:

> Agree with Jörn. Dynamically creating/deleting Topics is nontrivial to
> manage.
> With the limited knowledge about your scenario - it appears that you are
> using topics as some kind of message type enum.
> If that is the case - you might be better off with one (or just a few
> topics) and have a messagetype field in kafka event itself.
> Your streaming job can then match-case incoming events on this field to
> choose the right processor for respective events.
>
> On Tue, Jun 13, 2017 at 1:47 PM, Jörn Franke <[email protected]> wrote:
>
>> I do not fully understand the design here.
>> Why not send all to one topic with some application id in the message and
>> you write to one topic also indicating the application id.
>>
>> Can you elaborate a little bit more on the use case?
>>
>> Especially applications deleting/creating topics dynamically can be a
>> nightmare to operate
>>
>> > On 13. Jun 2017, at 22:03, Shashi Vishwakarma <[email protected]>
>> wrote:
>> >
>> > Hi
>> >
>> > I have to design a spark streaming application with below use case. I
>> am looking for best possible approach for this.
>> >
>> > I have application which pushing data into 1000+ different topics each
>> has different purpose . Spark streaming will receive data from each topic
>> and after processing it will write back to corresponding another topic.
>> >
>> > Ex.
>> >
>> > Input Type 1 Topic  --> Spark Streaming --> Output Type 1 Topic
>> > Input Type 2 Topic  --> Spark Streaming --> Output Type 2 Topic
>> > Input Type 3 Topic  --> Spark Streaming --> Output Type 3 Topic
>> > .
>> > .
>> > .
>> > Input Type N Topic  --> Spark Streaming --> Output Type N Topic  and so
>> on.
>> >
>> > I need to answer following questions.
>> >
>> > 1. Is it a good idea to launch 1000+ spark streaming application per
>> topic basis ? Or I should have one streaming application for all topics as
>> processing logic going to be same ?
>> > 2. If one streaming context , then how will I determine which RDD
>> belongs to which Kafka topic , so that after processing I can write it back
>> to its corresponding OUTPUT Topic?
>> > 3. Client may add/delete topic from Kafka , how do dynamically handle
>> in Spark streaming ?
>> > 4. How do I restart job automatically on failure ?
>> >
>> > Any other issue you guys see here ?
>> >
>> > Highly appreicate your response.
>> >
>> > Thanks
>> > Shashi
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [email protected]
>>
>>
>

Re: Spark Streaming Design Suggestion

Reply via email to