Yes agree shuffle data reveals that offsets+data is transformed.
Wanted to understand mapPartition or any transformation in (
directKafkaStream.repartition(numexecutors).mapPartitions(...)) is
happening before shuffle or after shuffle.
If after shuffle - is this due to the reason that very first
The answer already given is correct. You shouldn't doubt this, because
you've already seen the shuffle data change accordingly.
On Fri, Sep 4, 2015 at 11:25 AM, Shushant Arora
wrote:
> But Kafka stream has underlyng RDD which consists of offsets reanges only-
> so how does repartition works ?
>
But Kafka stream has underlyng RDD which consists of offsets reanges only-
so how does repartition works ?
1. First it evaluates the transformation and then repartition
2.or first it repartition and then transform. - In this case data should
not be transformed rather offset ranges only should be r
Yes not the offset ranges, but the real data will be shuffled when you
using repartition().
Thanks
Saisai
On Fri, Sep 4, 2015 at 12:42 PM, Shushant Arora
wrote:
> 1.Does repartitioning on direct kafka stream shuffles only the offsets or
> exact kafka messages across executors?
>
> Say I have a
1.Does repartitioning on direct kafka stream shuffles only the offsets or
exact kafka messages across executors?
Say I have a direct kafkastream
directKafkaStream.repartition(numexecutors).mapPartitions(new
FlatMapFunction>, String>(){
...
}
Say originally I have 5*numexceutor partitons in kafka