How to change the parallelism level of input dstreams

Dong Mo Wed, 09 Apr 2014 09:48:33 -0700

 Dear list,

A quick question about spark streaming:


Say I have this stage set up in my Spark Streaming cluster:

batched TCP stream ==> map(expensive computation) ===> ReduceByKey

I know I can set the number of tasks for ReduceByKey.

But I didn't find a place to specify the parallelism for the input
dstream(RDD sequence generated after the TCP stream). Do I need to
explicitly call repartition() to split the input RDD streams into many
parititions? If that is the case, what is the mechanism used to split the
RDD stream? Random fully reparation on each (K,V) pair (effectively a
shuffle) or more like rebalance?
And what is the default parallelism level for input stream?

Thank you so much
-Mo

How to change the parallelism level of input dstreams

Reply via email to