Well spark steraming is supposed to create / distribute the Receivers on different cluster nodes. If you are saying that actually your receivers are running on the same node probably that node is getting most of the data to minimize the network transfer costs
If you want to distribute your data more evenly you can partition it explicitly Also contact Data Bricks why the Receivers are not being distributed on different cluster nodes From: Laeeq Ahmed [mailto:laeeqsp...@yahoo.com] Sent: Monday, April 20, 2015 3:57 PM To: Evo Eftimov; user@spark.apache.org Subject: Re: Equal number of RDD Blocks I also see that its creating both receivers on the same executor and that might be the cause of having more RDDs on executor than the other. Can I suggest spark to create each receiver on a each executor Regards, Laeeq On Monday, April 20, 2015 4:51 PM, Evo Eftimov <evo.efti...@isecc.com> wrote: And what is the message rate of each topic mate – that was the other part of the required clarifications From: Laeeq Ahmed [mailto:laeeqsp...@yahoo.com] Sent: Monday, April 20, 2015 3:38 PM To: Evo Eftimov; user@spark.apache.org Subject: Re: Equal number of RDD Blocks Hi, I have two different topics and two Kafka receivers, one for each topic. Regards, Laeeq On Monday, April 20, 2015 4:28 PM, Evo Eftimov <evo.efti...@isecc.com> wrote: What is meant by “streams” here: 1. Two different DSTream Receivers producing two different DSTreams consuming from two different kafka topics, each with different message rate 2. One kafka topic (hence only one message rate to consider) but with two different DStream receivers (ie running in parallel) giving a start of two different DSTreams From: Laeeq Ahmed [mailto:laeeqsp...@yahoo.com.INVALID] Sent: Monday, April 20, 2015 3:15 PM To: user@spark.apache.org Subject: Equal number of RDD Blocks Hi, I have two streams of data from kafka. How can I make approx. equal number of RDD blocks of on two executors. Please see the attachement, one worker has 1785 RDD blocks and the other has 26. Regards, Laeeq