Re: repartition on direct kafka stream
Yes agree shuffle data reveals that offsets+data is transformed. Wanted to understand mapPartition or any transformation in ( directKafkaStream.repartition(numexecutors).mapPartitions(...)) is happening before shuffle or after shuffle. If after shuffle - is this due to the reason that very first transformation/action on directkafkastream (be it repartition() or mapPartition()) made directKafkaStream to evaluate and then repartition made the data to shuffled and then mapPartition is called on shuffled data. On Fri, Sep 4, 2015 at 10:33 PM, Cody Koeninger wrote: > The answer already given is correct. You shouldn't doubt this, because > you've already seen the shuffle data change accordingly. > > On Fri, Sep 4, 2015 at 11:25 AM, Shushant Arora > wrote: > >> But Kafka stream has underlyng RDD which consists of offsets reanges >> only- so how does repartition works ? >> >> 1. First it evaluates the transformation and then repartition >> 2.or first it repartition and then transform. - In this case data should >> not be transformed rather offset ranges only should be repartition and >> shuffled. >> >> >> >> On Fri, Sep 4, 2015 at 10:24 AM, Saisai Shao >> wrote: >> >>> Yes not the offset ranges, but the real data will be shuffled when you >>> using repartition(). >>> >>> Thanks >>> Saisai >>> >>> On Fri, Sep 4, 2015 at 12:42 PM, Shushant Arora < >>> shushantaror...@gmail.com> wrote: >>> 1.Does repartitioning on direct kafka stream shuffles only the offsets or exact kafka messages across executors? Say I have a direct kafkastream directKafkaStream.repartition(numexecutors).mapPartitions(new FlatMapFunction>, String>(){ ... } Say originally I have 5*numexceutor partitons in kafka. Now only the offset ranges should be shuffled to executors not exact kafka messages? But I am seeing a very large size of shuffles data read/write on streaming ui. When I remove this repartition - shuffle read /write becomes 0. >>> >> >
Re: repartition on direct kafka stream
The answer already given is correct. You shouldn't doubt this, because you've already seen the shuffle data change accordingly. On Fri, Sep 4, 2015 at 11:25 AM, Shushant Arora wrote: > But Kafka stream has underlyng RDD which consists of offsets reanges only- > so how does repartition works ? > > 1. First it evaluates the transformation and then repartition > 2.or first it repartition and then transform. - In this case data should > not be transformed rather offset ranges only should be repartition and > shuffled. > > > > On Fri, Sep 4, 2015 at 10:24 AM, Saisai Shao > wrote: > >> Yes not the offset ranges, but the real data will be shuffled when you >> using repartition(). >> >> Thanks >> Saisai >> >> On Fri, Sep 4, 2015 at 12:42 PM, Shushant Arora < >> shushantaror...@gmail.com> wrote: >> >>> 1.Does repartitioning on direct kafka stream shuffles only the offsets >>> or exact kafka messages across executors? >>> >>> Say I have a direct kafkastream >>> >>> directKafkaStream.repartition(numexecutors).mapPartitions(new >>> FlatMapFunction>, String>(){ >>> ... >>> } >>> >>> Say originally I have 5*numexceutor partitons in kafka. >>> >>> Now only the offset ranges should be shuffled to executors not exact >>> kafka messages? But I am seeing a very large size of shuffles data >>> read/write on streaming ui. When I remove this repartition - shuffle read >>> /write becomes 0. >>> >>> >> >
Re: repartition on direct kafka stream
But Kafka stream has underlyng RDD which consists of offsets reanges only- so how does repartition works ? 1. First it evaluates the transformation and then repartition 2.or first it repartition and then transform. - In this case data should not be transformed rather offset ranges only should be repartition and shuffled. On Fri, Sep 4, 2015 at 10:24 AM, Saisai Shao wrote: > Yes not the offset ranges, but the real data will be shuffled when you > using repartition(). > > Thanks > Saisai > > On Fri, Sep 4, 2015 at 12:42 PM, Shushant Arora > wrote: > >> 1.Does repartitioning on direct kafka stream shuffles only the offsets or >> exact kafka messages across executors? >> >> Say I have a direct kafkastream >> >> directKafkaStream.repartition(numexecutors).mapPartitions(new >> FlatMapFunction>, String>(){ >> ... >> } >> >> Say originally I have 5*numexceutor partitons in kafka. >> >> Now only the offset ranges should be shuffled to executors not exact >> kafka messages? But I am seeing a very large size of shuffles data >> read/write on streaming ui. When I remove this repartition - shuffle read >> /write becomes 0. >> >> >
Re: repartition on direct kafka stream
Yes not the offset ranges, but the real data will be shuffled when you using repartition(). Thanks Saisai On Fri, Sep 4, 2015 at 12:42 PM, Shushant Arora wrote: > 1.Does repartitioning on direct kafka stream shuffles only the offsets or > exact kafka messages across executors? > > Say I have a direct kafkastream > > directKafkaStream.repartition(numexecutors).mapPartitions(new > FlatMapFunction>, String>(){ > ... > } > > Say originally I have 5*numexceutor partitons in kafka. > > Now only the offset ranges should be shuffled to executors not exact kafka > messages? But I am seeing a very large size of shuffles data read/write on > streaming ui. When I remove this repartition - shuffle read /write becomes > 0. > >