> need to map the keys, modify them
>> and then do a join.

This will always trigger a rebalance. There is no API atm to tell KS
that partitioning is preserved.

Custom partitioner won't help for your case as far as I understand it.


-Matthias

On 12/17/17 9:48 PM, Sameer Kumar wrote:
> Actually, I am doing joining after map. I need to map the keys, modify them
> and then do a join.
> 
> I was thinking of using always passing a partition key based on which
> partition happens.
> Step by step flow is:-
> 1. Data is already partitoned by do userid.
> 2. I do a map to joins impressions tied to a user with view notifications.
> 3. I count valid impressions across different aggregations(i.e. across diff
> dimension groups).
> 
> Thanks,
> -Sameer.
> 
> On Mon, Dec 18, 2017 at 1:37 AM, Matthias J. Sax <matth...@confluent.io>
> wrote:
> 
>> Two comments:
>>
>> 1) As long, as you don't do an aggregation/join after a map(), there
>> will be not repartitioning. Streams does repartitioning "lazy", ie, only
>> if it's required. As long as you only chain filter/map etc, no
>> repartitioning will be done.
>>
>> 2) Can't you use mapValue() instead of map()? If you use map() to only
>> read the key but only modify the value (-> "data is still local") a
>> custom partitioner won't help. Also, we are improving this in upcoming
>> version 1.1 and allows read access to a key in mapValue() (cf. KIP-149
>> for details).
>>
>> Hope this helps.
>>
>>
>> -Matthias
>>
>> On 12/17/17 8:20 AM, Sameer Kumar wrote:
>>> I have multiple map and filter phases in my application dag and though I
>> am
>>> generating different keys at different points, the data is still local.
>>> Re-partitioning for me here is adding unnecessary network shuffling, I
>> want
>>> to minimize it.
>>>
>>> -Sameer.
>>>
>>> On Friday, December 15, 2017, Matthias J. Sax <matth...@confluent.io>
>> wrote:
>>>
>>>> It's not recommended to write a custom partitioner because it's pretty
>>>> difficult to write a correct one. There are many dependencies and you
>>>> need deep knowledge of Kafka Streams internals to get it write.
>>>> Otherwise, your custom partitioner breaks Kafka Streams.
>>>>
>>>> That is the reason why it's not documented...
>>>>
>>>> Not sure so, what you try to achieve in the first place. What do you
>>>> mean by
>>>>
>>>>> I want to make sure that during map phase, the keys
>>>>>> produced adhere to the customized partitioner.
>>>>
>>>> Maybe you achieve what you want differently.
>>>>
>>>>
>>>> -Matthias
>>>>
>>>> On 12/15/17 1:19 AM, Sameer Kumar wrote:
>>>>> Hi,
>>>>>
>>>>> I want to use the custom partitioner in streams, I couldnt find the
>> same
>>>> in
>>>>> the documentation. I want to make sure that during map phase, the keys
>>>>> produced adhere to the customized partitioner.
>>>>>
>>>>> -Sameer.
>>>>>
>>>>
>>>>
>>>
>>
>>
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to