I guess each receiver occupies a executor. So there was only one executor available for processing the job.
On Fri, May 22, 2015 at 1:24 PM, Mike Trienis <mike.trie...@orcsol.com> wrote: > Hi All, > > I have cluster of four nodes (three workers and one master, with one core > each) which consumes data from Kinesis at 15 second intervals using two > streams (i.e. receivers). The job simply grabs the latest batch and pushes > it to MongoDB. I believe that the problem is that all tasks are executed on > a single worker node and never distributed to the others. This is true even > after I set the number of concurrentJobs to 3. Overall, I would really like > to increase throughput (i.e. more than 500 records / second) and understand > why all executors are not being utilized. > > Here are some parameters I have set: > > - > - spark.streaming.blockInterval 200 > - spark.locality.wait 500 > - spark.streaming.concurrentJobs 3 > > This is the code that's actually doing the writing: > > def write(rdd: RDD[Data], time:Time) : Unit = { > val result = doSomething(rdd, time) > result.foreachPartition { i => > i.foreach(record => connection.insert(record)) > } > } > > def doSomething(rdd: RDD[Data]) : RDD[MyObject] = { > rdd.flatMap(MyObject) > } > > Any ideas as to how to improve the throughput? > > Thanks, Mike. >