subject:"How does MapWithStateRDD distribute the data"

Re: How does MapWithStateRDD distribute the data

2016-08-03 Thread Cody Koeninger

Are you using KafkaUtils.createDirectStream?

On Wed, Aug 3, 2016 at 9:42 AM, Soumitra Johri
 wrote:
> Hi,
>
> I am running a steaming job with 4 executors and 16 cores so that each
> executor has two cores to work with. The input Kafka topic has 4 partitions.
> With this given configuration I was expecting MapWithStateRDD to be evenly
> distributed across all executors, how ever I see that it uses only two
> executors on which MapWithStateRDD data is distributed. Sometimes the data
> goes only to one executor.
>
> How can this be explained and pretty sure there would be some math to
> understand this behavior.
>
> I am using the standard standalone 1.6.2 cluster.
>
> Thanks
> Soumitra

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: How does MapWithStateRDD distribute the data

2016-08-03 Thread Ben Teeuwen

Did you check the executors logs to check whether the kafka offsets pulled in 
evenly over the 4 executors?

I recall a similar situation with such uneven balancing from a kafka stream, 
and ended up raising the amount of resources (RAM and cores). Then it nicely 
balanced out. I don’t understand the mechanism behind it though.

> On Aug 3, 2016, at 4:42 PM, Soumitra Johri  
> wrote:
> 
> Hi,
> 
> I am running a steaming job with 4 executors and 16 cores so that each 
> executor has two cores to work with. The input Kafka topic has 4 partitions.
> With this given configuration I was expecting MapWithStateRDD to be evenly 
> distributed across all executors, how ever I see that it uses only two 
> executors on which MapWithStateRDD data is distributed. Sometimes the data 
> goes only to one executor.
> 
> How can this be explained and pretty sure there would be some math to 
> understand this behavior.
> 
> I am using the standard standalone 1.6.2 cluster.
> 
> Thanks
> Soumitra

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

How does MapWithStateRDD distribute the data

2016-08-03 Thread Soumitra Johri

Hi,

I am running a steaming job with 4 executors and 16 cores so that each
executor has two cores to work with. The input Kafka topic has 4 partitions.
With this given configuration I was expecting MapWithStateRDD to be evenly
distributed across all executors, how ever I see that it uses only two
executors on which MapWithStateRDD data is distributed. Sometimes the data
goes only to one executor.

How can this be explained and pretty sure there would be some math to
understand this behavior.

I am using the standard standalone 1.6.2 cluster.

Thanks
Soumitra

Re: How does MapWithStateRDD distribute the data

Re: How does MapWithStateRDD distribute the data

How does MapWithStateRDD distribute the data

3 matches

Site Navigation

Mail list logo

Footer information