The answer already given is correct. You shouldn't doubt this, because you've already seen the shuffle data change accordingly.
On Fri, Sep 4, 2015 at 11:25 AM, Shushant Arora <shushantaror...@gmail.com> wrote: > But Kafka stream has underlyng RDD which consists of offsets reanges only- > so how does repartition works ? > > 1. First it evaluates the transformation and then repartition > 2.or first it repartition and then transform. - In this case data should > not be transformed rather offset ranges only should be repartition and > shuffled. > > > > On Fri, Sep 4, 2015 at 10:24 AM, Saisai Shao <sai.sai.s...@gmail.com> > wrote: > >> Yes not the offset ranges, but the real data will be shuffled when you >> using repartition(). >> >> Thanks >> Saisai >> >> On Fri, Sep 4, 2015 at 12:42 PM, Shushant Arora < >> shushantaror...@gmail.com> wrote: >> >>> 1.Does repartitioning on direct kafka stream shuffles only the offsets >>> or exact kafka messages across executors? >>> >>> Say I have a direct kafkastream >>> >>> directKafkaStream.repartition(numexecutors).mapPartitions(new >>> FlatMapFunction<Iterator<Tuple2<byte[],byte[]>>, String>(){ >>> ... >>> } >>> >>> Say originally I have 5*numexceutor partitons in kafka. >>> >>> Now only the offset ranges should be shuffled to executors not exact >>> kafka messages? But I am seeing a very large size of shuffles data >>> read/write on streaming ui. When I remove this repartition - shuffle read >>> /write becomes 0. >>> >>> >> >