Re: repartition on direct kafka stream

Cody Koeninger Fri, 04 Sep 2015 10:04:26 -0700

The answer already given is correct.  You shouldn't doubt this, because
you've already seen the shuffle data change accordingly.


On Fri, Sep 4, 2015 at 11:25 AM, Shushant Arora <shushantaror...@gmail.com>
wrote:

> But Kafka stream has underlyng RDD which consists of offsets reanges only-
> so how does repartition works ?
>
> 1. First it evaluates the transformation and then repartition
> 2.or first it repartition and then transform. - In this case data should
> not be transformed rather offset ranges only should be repartition and
> shuffled.
>
>
>
> On Fri, Sep 4, 2015 at 10:24 AM, Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
>
>> Yes not the offset ranges, but the real data will be shuffled when you
>> using repartition().
>>
>> Thanks
>> Saisai
>>
>> On Fri, Sep 4, 2015 at 12:42 PM, Shushant Arora <
>> shushantaror...@gmail.com> wrote:
>>
>>> 1.Does repartitioning on direct kafka stream shuffles only the offsets
>>> or exact kafka messages across executors?
>>>
>>> Say I have a direct kafkastream
>>>
>>> directKafkaStream.repartition(numexecutors).mapPartitions(new
>>> FlatMapFunction<Iterator<Tuple2<byte[],byte[]>>, String>(){
>>> ...
>>> }
>>>
>>> Say originally I have 5*numexceutor partitons in kafka.
>>>
>>> Now only the offset ranges should be shuffled to executors not exact
>>> kafka messages? But I am seeing a very large size of shuffles data
>>> read/write on streaming ui. When I remove this repartition - shuffle read
>>> /write becomes 0.
>>>
>>>
>>
>

Re: repartition on direct kafka stream

Reply via email to