[jira] [Commented] (KAFKA-3561) Auto create through topic for KStream aggregation and join

2016-06-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334446#comment-15334446
 ] 

ASF GitHub Bot commented on KAFKA-3561:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1472


> Auto create through topic for KStream aggregation and join
> --
>
> Key: KAFKA-3561
> URL: https://issues.apache.org/jira/browse/KAFKA-3561
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Damian Guy
>  Labels: api
> Fix For: 0.10.1.0
>
>
> For KStream.join / aggregateByKey operations that requires the streams to be 
> partitioned on the record key, today users should repartition themselves 
> through the "through" call:
> {code}
> stream1 = builder.stream("topic1");
> stream2 = builder.stream("topic2");
> stream3 = stream1.map(/* set the right key for join*/).through("topic3");
> stream4 = stream2.map(/* set the right key for join*/).through("topic4");
> stream3.join(stream4, ..)
> {code}
> This pattern can actually be done by the Streams DSL itself instead of 
> requiring users to specify themselves, i.e. users can just set the right key 
> like (see KAFKA-3430) and then call join, which will be translated by adding 
> the "internal topic for repartition".
> Another thing is that today if user do not call "through" after setting a new 
> key, the aggregation result would not be correct as the aggregation is based 
> on key B while the source partitions is partitioned by key A and hence each 
> task will only get a partial aggregation for all keys. But this is not 
> validated in the DSL today. We should do both the auto-translation and 
> validation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3561) Auto create through topic for KStream aggregation and join

2016-06-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15315908#comment-15315908
 ] 

ASF GitHub Bot commented on KAFKA-3561:
---

GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/1472

KAFKA-3561: Auto create through topic for KStream aggregation and join [WIP]

@guozhangwang @enothereska @mjsax @miguno

If you get a chance can you please take a look at this. I've done the 
repartitioning in the join, but it results in 2 internal topics for each join. 
This seems like overkill as sometimes we wouldn't need to repartition at all, 
others just 1 topic, and then sometimes both, but I'm not sure how we can know 
that. 

I'd also need to implement something similar for leftJoin, but again, i'd 
like to see if i'm heading down the right path or if anyone has any other 
bright ideas.

For reference - https://github.com/apache/kafka/pull/1453 - the previous PR

Thanks for taking the time and looking forward to getting some welcome 
advice :-)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka KAFKA-3561

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1472.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1472


commit 053c809b5aa322327988909d53027d4682df6825
Author: Damian Guy 
Date:   2016-06-01T13:03:23Z

repartition through internal topic on KStream.map()

commit ba1b50285f9eae028e531183500de90a2ae877c7
Author: Damian Guy 
Date:   2016-06-04T15:17:51Z

Merge remote-tracking branch 'upstream/trunk' into KAFKA-3561

commit a8c14f38914410bc3c7ff1e96c040bf1a1992cef
Author: Damian Guy 
Date:   2016-06-05T14:41:12Z

repartition on join

commit 1220c61464676881273e47d7e02ea8d502cd8fd4
Author: Damian Guy 
Date:   2016-06-05T14:51:17Z

repartition on join




> Auto create through topic for KStream aggregation and join
> --
>
> Key: KAFKA-3561
> URL: https://issues.apache.org/jira/browse/KAFKA-3561
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Damian Guy
>  Labels: api
> Fix For: 0.10.1.0
>
>
> For KStream.join / aggregateByKey operations that requires the streams to be 
> partitioned on the record key, today users should repartition themselves 
> through the "through" call:
> {code}
> stream1 = builder.stream("topic1");
> stream2 = builder.stream("topic2");
> stream3 = stream1.map(/* set the right key for join*/).through("topic3");
> stream4 = stream2.map(/* set the right key for join*/).through("topic4");
> stream3.join(stream4, ..)
> {code}
> This pattern can actually be done by the Streams DSL itself instead of 
> requiring users to specify themselves, i.e. users can just set the right key 
> like (see KAFKA-3430) and then call join, which will be translated by adding 
> the "internal topic for repartition".
> Another thing is that today if user do not call "through" after setting a new 
> key, the aggregation result would not be correct as the aggregation is based 
> on key B while the source partitions is partitioned by key A and hence each 
> task will only get a partial aggregation for all keys. But this is not 
> validated in the DSL today. We should do both the auto-translation and 
> validation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3561) Auto create through topic for KStream aggregation and join

2016-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15315545#comment-15315545
 ] 

ASF GitHub Bot commented on KAFKA-3561:
---

Github user dguy closed the pull request at:

https://github.com/apache/kafka/pull/1453


> Auto create through topic for KStream aggregation and join
> --
>
> Key: KAFKA-3561
> URL: https://issues.apache.org/jira/browse/KAFKA-3561
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Damian Guy
>  Labels: api
> Fix For: 0.10.1.0
>
>
> For KStream.join / aggregateByKey operations that requires the streams to be 
> partitioned on the record key, today users should repartition themselves 
> through the "through" call:
> {code}
> stream1 = builder.stream("topic1");
> stream2 = builder.stream("topic2");
> stream3 = stream1.map(/* set the right key for join*/).through("topic3");
> stream4 = stream2.map(/* set the right key for join*/).through("topic4");
> stream3.join(stream4, ..)
> {code}
> This pattern can actually be done by the Streams DSL itself instead of 
> requiring users to specify themselves, i.e. users can just set the right key 
> like (see KAFKA-3430) and then call join, which will be translated by adding 
> the "internal topic for repartition".
> Another thing is that today if user do not call "through" after setting a new 
> key, the aggregation result would not be correct as the aggregation is based 
> on key B while the source partitions is partitioned by key A and hence each 
> task will only get a partial aggregation for all keys. But this is not 
> validated in the DSL today. We should do both the auto-translation and 
> validation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3561) Auto create through topic for KStream aggregation and join

2016-06-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310298#comment-15310298
 ] 

ASF GitHub Bot commented on KAFKA-3561:
---

GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/1453

KAFKA-3561: Auto create through topic for KStream aggregation and join [WIP]

@guozhangwang can you please take a look at this? It is not close to being 
done but i'd like some feedback to ensure i'm heading down the right path and 
understand what this JIRA is asking. The main things to look at are the 
`KStreamImpl.map(...)` method I added and the change made to 
`TopologyBuilder.copartitionGroups`

There is also a test, `KStreamRepartitionMappedKeyTest`, that passes with 
these changes.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka KAFKA-3561

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1453.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1453


commit 053c809b5aa322327988909d53027d4682df6825
Author: Damian Guy 
Date:   2016-06-01T13:03:23Z

repartition through internal topic on KStream.map()




> Auto create through topic for KStream aggregation and join
> --
>
> Key: KAFKA-3561
> URL: https://issues.apache.org/jira/browse/KAFKA-3561
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Damian Guy
>  Labels: api
> Fix For: 0.10.1.0
>
>
> For KStream.join / aggregateByKey operations that requires the streams to be 
> partitioned on the record key, today users should repartition themselves 
> through the "through" call:
> {code}
> stream1 = builder.stream("topic1");
> stream2 = builder.stream("topic2");
> stream3 = stream1.map(/* set the right key for join*/).through("topic3");
> stream4 = stream2.map(/* set the right key for join*/).through("topic4");
> stream3.join(stream4, ..)
> {code}
> This pattern can actually be done by the Streams DSL itself instead of 
> requiring users to specify themselves, i.e. users can just set the right key 
> like (see KAFKA-3430) and then call join, which will be translated by adding 
> the "internal topic for repartition".
> Another thing is that today if user do not call "through" after setting a new 
> key, the aggregation result would not be correct as the aggregation is based 
> on key B while the source partitions is partitioned by key A and hence each 
> task will only get a partial aggregation for all keys. But this is not 
> validated in the DSL today. We should do both the auto-translation and 
> validation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3561) Auto create through topic for KStream aggregation and join

2016-05-27 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304347#comment-15304347
 ] 

Guozhang Wang commented on KAFKA-3561:
--

Hi [~damianguy] a couple thoughts:

1. Topics referenced in {{through}} call are treated as "external topics" that 
owned by users themselves, like topics from {{builder.stream/table}} and {{to}} 
operators. One goal of this ticket about "auto creating repartition topics" is 
to make it as an internal topic just as what we did for {{KTable.aggregation}}. 
So we should not simply just add a "through" operator in between, and requiring 
users to provide the topic name.

2. One related JIRA ticket is https://issues.apache.org/jira/browse/KAFKA-3576: 
we originally want to differentiate KStream and KTable as much as possible for 
their semantic difference at the API layer, but there are suggestions that we 
may actually consider unifying them just for consistent user experience. Would 
like to hear your opinions on that, and if we are really going to do KAFKA-3576 
as well, we'd better do it together with this ticket.

> Auto create through topic for KStream aggregation and join
> --
>
> Key: KAFKA-3561
> URL: https://issues.apache.org/jira/browse/KAFKA-3561
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Damian Guy
>  Labels: api
> Fix For: 0.10.1.0
>
>
> For KStream.join / aggregateByKey operations that requires the streams to be 
> partitioned on the record key, today users should repartition themselves 
> through the "through" call:
> {code}
> stream1 = builder.stream("topic1");
> stream2 = builder.stream("topic2");
> stream3 = stream1.map(/* set the right key for join*/).through("topic3");
> stream4 = stream2.map(/* set the right key for join*/).through("topic4");
> stream3.join(stream4, ..)
> {code}
> This pattern can actually be done by the Streams DSL itself instead of 
> requiring users to specify themselves, i.e. users can just set the right key 
> like (see KAFKA-3430) and then call join, which will be translated by adding 
> the "internal topic for repartition".
> Another thing is that today if user do not call "through" after setting a new 
> key, the aggregation result would not be correct as the aggregation is based 
> on key B while the source partitions is partitioned by key A and hence each 
> task will only get a partial aggregation for all keys. But this is not 
> validated in the DSL today. We should do both the auto-translation and 
> validation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3561) Auto create through topic for KStream aggregation and join

2016-05-27 Thread Damian Guy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303877#comment-15303877
 ] 

Damian Guy commented on KAFKA-3561:
---

Guozhang - i'd like to do this if it is ok with you. I'm going to need a bit of 
guidance though.
It looks to me that I'd need to change the signature of map to something like:

 {code}
 KStream map(final KeyValueMapper> 
mapper,
 final Serde keySerde,
 final Serde valueSerde,
 final String topic);
{code}

and then it could be implemented like so:
{code}
@Override
public  KStream map(final KeyValueMapper> mapper,
final Serde keySerde,
final Serde valueSerde,
final String topic) {
String name = topology.newName(MAP_NAME);
topology.addProcessor(name, new KStreamMap<>(mapper), this.name);
return new KStreamImpl(topology, name, null).through(keySerde, 
valueSerde, topic);
}
{code}

What other methods does this apply to. What am I missing? (I'm sure there is a 
lot of context i'm missing)
Thanks,
Damian

> Auto create through topic for KStream aggregation and join
> --
>
> Key: KAFKA-3561
> URL: https://issues.apache.org/jira/browse/KAFKA-3561
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Damian Guy
>  Labels: api
> Fix For: 0.10.1.0
>
>
> For KStream.join / aggregateByKey operations that requires the streams to be 
> partitioned on the record key, today users should repartition themselves 
> through the "through" call:
> {code}
> stream1 = builder.stream("topic1");
> stream2 = builder.stream("topic2");
> stream3 = stream1.map(/* set the right key for join*/).through("topic3");
> stream4 = stream2.map(/* set the right key for join*/).through("topic4");
> stream3.join(stream4, ..)
> {code}
> This pattern can actually be done by the Streams DSL itself instead of 
> requiring users to specify themselves, i.e. users can just set the right key 
> like (see KAFKA-3430) and then call join, which will be translated by adding 
> the "internal topic for repartition".
> Another thing is that today if user do not call "through" after setting a new 
> key, the aggregation result would not be correct as the aggregation is based 
> on key B while the source partitions is partitioned by key A and hence each 
> task will only get a partial aggregation for all keys. But this is not 
> validated in the DSL today. We should do both the auto-translation and 
> validation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)