What is called Bolt in Storm is essentially a combination of 
[Transformation/Action and DStream RDD] in Spark – so to achieve a higher 
parallelism for specific Transformation/Action on specific Dstream RDD simply 
repartition it to the required number of partitions which directly relates to 
the corresponding number of Threads   

 

From: anshu shukla [mailto:anshushuk...@gmail.com] 
Sent: Wednesday, May 6, 2015 9:33 AM
To: ayan guha
Cc: user@spark.apache.org; d...@spark.apache.org
Subject: Re: Creating topology in spark streaming

 

But main problem is how to increase the level of parallelism  for any 
particular bolt logic .

 

suppose i  want  this type of topology .

 

https://storm.apache.org/documentation/images/topology.png

 

How we can manage it .

 

On Wed, May 6, 2015 at 1:36 PM, ayan guha <guha.a...@gmail.com> wrote:

Every transformation on a dstream will create another dstream. You may want to 
take a look at foreachrdd? Also, kindly share your code so people can help 
better

On 6 May 2015 17:54, "anshu shukla" <anshushuk...@gmail.com> wrote:

Please help  guys, Even  After going through all the examples given i have not 
understood how to pass the  D-streams  from one bolt/logic to other (without 
writing it on HDFS etc.) just like emit function in storm .

Suppose i have topology with 3  bolts(say) 

 

BOLT1(parse the tweets nd emit tweet using given hashtags)=====>Bolt2(Complex 
logic for sentiment analysis over tweets)=======>BOLT3(submit tweets to the sql 
database using spark SQL)


 

 

Now  since Sentiment analysis will take most of the time ,we have to increase 
its level of parallelism for tuning latency. Howe to increase the levele of 
parallelism since the logic of topology is not clear .

 

-- 

Thanks & Regards,
Anshu Shukla

Indian Institute of Sciences





 

-- 

Thanks & Regards,
Anshu Shukla

Reply via email to