Regarding your first question, having more partitions than you do executors
usually means you'll have better utilization, because the workload will be
distributed more evenly.  There's some degree of per-task overhead, but as
long as you don't have a huge imbalance between number of tasks and number
of executors that shouldn't be a large problem.

I don't really understand your second question.

On Sat, Jul 11, 2015 at 5:00 AM, Shushant Arora <shushantaror...@gmail.com>
wrote:

> 1.spark streaming 1.3 creates as many RDD Partitions as there are kafka
> partitions in topic. Say I have 300 partitions in topic and 10 executors
> and each with 3 cores so , is it means at a time only 10*3=30 partitions
> are processed and then 30 like that since executors launch tasks per RDD
> partitions , so I need in total; 300 tasks but since I have 30 cores(10
> executors each with 3 cores) so these tasks will execute 30 after 30 till
> 300.
>
> So reducing no of kafka paartitions to say 100 will speed up the
> processing?
>
> 2.In spark streaming job when I processed the kafka stream using foreachRDD
>
> directKafkaStream.foreachRDD(new function( public void call(  vi)){
> v1.foreachPartition(new function(){public void call(){
> //..process partition
> }})
>
> });
>
> since foreachRDD is operation so it spawns spark job but these jobs are
> not coming on driver console like in map and print function as
>
> 1.spark streaming 1.3 creates as many RDD Partitions as there are kafka
> partitions in topic. Say I have 300 partitions in topic and 10 executors
> and each with 3 cores so , is it means at a time only 10*3=30 partitions
> are processed and then 30 like that since executors launch tasks per RDD
> partitions , so I need in total; 300 tasks but since I have 30 cores(10
> executors each with 3 cores) so these tasks will execute 30 after 30 till
> 300.
>
> So reducing no of kafka paartitions to say 100 will speed up the
> processing?
>
> 2.In spark streaming job when I processed the kafka stream using foreachRDD
>
> directKafkaStream.foreachRDD(new function( public void call(  vi)){
> v1.foreachPartition(new function(){public void call(){
> //..process partition
> }})
>
> });
>
> since foreachRDD is operation so it spawns spark job but these jobs
> timings are not coming on driver console like in map and print function as
>
>
> -------------------------------------------
> Time: 1429054870000 ms
> -------------------------------------------
> ------------------------------------------
> Time: 1429054871000 ms
> -------------------------------------------
>
> ..................
>
> Why is it so?
>
>
> Thanks
> Shushant
>
>
>
>
>
>

Reply via email to