Re: [spark-streaming] New directStream API reads topic's partitions sequentially. Why?

2015-09-05 Thread Понькин Алексей
Hi Cody, Thank you for quick response. The problem was that my application did not have enough resources(all executors were busy). So spark decided to run these tasks sequentially. When I add more executors for application everything goes fine. Thank you anyway. P.S. BTW thanks you for great vide

Re: [spark-streaming] New directStream API reads topic's partitions sequentially. Why?

2015-09-04 Thread Cody Koeninger
The direct stream just makes a spark partition per kafka partition, so if those partitions are not getting evenly distributed among executors, something else is probably wrong with your configuration. If you replace the kafka stream with a dummy rdd created with e.g. sc.parallelize, what happens?

[spark-streaming] New directStream API reads topic's partitions sequentially. Why?

2015-09-04 Thread ponkin
Hi, I am trying to read kafka topic with new directStream method in KafkaUtils. I have Kafka topic with 8 partitions. I am running streaming job on yarn with 8 execuors with 1 core for each one. So noticed that spark reads all topic's partitions in one executor sequentially - this is obviously not