Hi Cody,
Thank you for quick response.
The problem was that my application did not have enough resources(all executors
were busy). So spark decided to run these tasks sequentially. When I add more
executors for application everything goes fine.
Thank you anyway.
P.S. BTW thanks you for great vide
The direct stream just makes a spark partition per kafka partition, so if
those partitions are not getting evenly distributed among executors,
something else is probably wrong with your configuration.
If you replace the kafka stream with a dummy rdd created with e.g.
sc.parallelize, what happens?
Hi,
I am trying to read kafka topic with new directStream method in KafkaUtils.
I have Kafka topic with 8 partitions.
I am running streaming job on yarn with 8 execuors with 1 core for each
one.
So noticed that spark reads all topic's partitions in one executor
sequentially - this is obviously not