Re: Spark Streaming: all tasks running on one executor (Kinesis + Mongodb)

Mike Trienis Fri, 22 May 2015 13:51:54 -0700

I guess each receiver occupies a executor. So there was only one executor
available for processing the job.


On Fri, May 22, 2015 at 1:24 PM, Mike Trienis <mike.trie...@orcsol.com>
wrote:

> Hi All,
>
> I have cluster of four nodes (three workers and one master, with one core
> each) which consumes data from Kinesis at 15 second intervals using two
> streams (i.e. receivers). The job simply grabs the latest batch and pushes
> it to MongoDB. I believe that the problem is that all tasks are executed on
> a single worker node and never distributed to the others. This is true even
> after I set the number of concurrentJobs to 3. Overall, I would really like
> to increase throughput (i.e. more than 500 records / second) and understand
> why all executors are not being utilized.
>
> Here are some parameters I have set:
>
>    -
>    - spark.streaming.blockInterval       200
>    - spark.locality.wait 500
>    - spark.streaming.concurrentJobs      3
>
> This is the code that's actually doing the writing:
>
> def write(rdd: RDD[Data], time:Time) : Unit = {
>     val result = doSomething(rdd, time)
>     result.foreachPartition { i =>
>         i.foreach(record => connection.insert(record))
>     }
> }
>
> def doSomething(rdd: RDD[Data]) : RDD[MyObject] = {
>     rdd.flatMap(MyObject)
> }
>
> Any ideas as to how to improve the throughput?
>
> Thanks, Mike.
>

Re: Spark Streaming: all tasks running on one executor (Kinesis + Mongodb)

Reply via email to