Re: repartition on direct kafka stream

2015-09-04 Thread Shushant Arora
Yes agree shuffle data reveals that offsets+data is transformed. Wanted to understand mapPartition or any transformation in ( directKafkaStream.repartition(numexecutors).mapPartitions(...)) is happening before shuffle or after shuffle. If after shuffle - is this due to the reason that very first

Re: repartition on direct kafka stream

2015-09-04 Thread Cody Koeninger
The answer already given is correct. You shouldn't doubt this, because you've already seen the shuffle data change accordingly. On Fri, Sep 4, 2015 at 11:25 AM, Shushant Arora wrote: > But Kafka stream has underlyng RDD which consists of offsets reanges only- > so how does repartition works ? >

Re: repartition on direct kafka stream

2015-09-04 Thread Shushant Arora
But Kafka stream has underlyng RDD which consists of offsets reanges only- so how does repartition works ? 1. First it evaluates the transformation and then repartition 2.or first it repartition and then transform. - In this case data should not be transformed rather offset ranges only should be r

Re: repartition on direct kafka stream

2015-09-03 Thread Saisai Shao
Yes not the offset ranges, but the real data will be shuffled when you using repartition(). Thanks Saisai On Fri, Sep 4, 2015 at 12:42 PM, Shushant Arora wrote: > 1.Does repartitioning on direct kafka stream shuffles only the offsets or > exact kafka messages across executors? > > Say I have a

repartition on direct kafka stream

2015-09-03 Thread Shushant Arora
1.Does repartitioning on direct kafka stream shuffles only the offsets or exact kafka messages across executors? Say I have a direct kafkastream directKafkaStream.repartition(numexecutors).mapPartitions(new FlatMapFunction>, String>(){ ... } Say originally I have 5*numexceutor partitons in kafka