Not separately at the level of `flatMap` and `map`. The number of partitions in the RDD those operations are working on determines the potential parallelism. The number of worker cores available determines how much of that potential can be actualized.
On Tue, Oct 22, 2013 at 7:24 AM, Ryan Chan <[email protected]> wrote: > In storm, you can control the number of thread with the setSpout/setBolt, > and how to do the same with Spark Streaming? > > e.g. > > val lines = ssc.socketTextStream(args(1), args(2).toInt) > val words = lines.flatMap(_.split(" ")) > val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _) > wordCounts.print() > ssc.start() > > > Sound like I cannot tell Spark to tell how many thread to be used with > `flatMap` and how many thread to be used with `map` etc, right? > > >
