[jira] [Commented] (STORM-2258) Streams api - support CoGroupByKey
[ https://issues.apache.org/jira/browse/STORM-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127614#comment-16127614 ] Kamal commented on STORM-2258: -- Modification done as per your review comments. Please take a look. > Streams api - support CoGroupByKey > -- > > Key: STORM-2258 > URL: https://issues.apache.org/jira/browse/STORM-2258 > Project: Apache Storm > Issue Type: Sub-task >Reporter: Arun Mahadevan > > Group together values with same key from both streams. Similar constructs are > supported in beam, spark and flink. > When called on a Stream of (K, V) and (K, W) pairs, return a new Stream of > (K, Seq[V], Seq[W]) tuples > See also - https://cloud.google.com/dataflow/model/group-by-key -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (STORM-2258) Streams api - support CoGroupByKey
[ https://issues.apache.org/jira/browse/STORM-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115915#comment-16115915 ] Kamal commented on STORM-2258: -- Thanks for reviewing the PR. Modified the PR as per your comments. Please review the next iteration. > Streams api - support CoGroupByKey > -- > > Key: STORM-2258 > URL: https://issues.apache.org/jira/browse/STORM-2258 > Project: Apache Storm > Issue Type: Sub-task >Reporter: Arun Mahadevan > > Group together values with same key from both streams. Similar constructs are > supported in beam, spark and flink. > When called on a Stream of (K, V) and (K, W) pairs, return a new Stream of > (K, Seq[V], Seq[W]) tuples > See also - https://cloud.google.com/dataflow/model/group-by-key -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (STORM-2258) Streams api - support CoGroupByKey
[ https://issues.apache.org/jira/browse/STORM-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16107432#comment-16107432 ] Kamal commented on STORM-2258: -- Please review the patch https://github.com/apache/storm/pull/2233 > Streams api - support CoGroupByKey > -- > > Key: STORM-2258 > URL: https://issues.apache.org/jira/browse/STORM-2258 > Project: Apache Storm > Issue Type: Sub-task >Reporter: Arun Mahadevan > > Group together values with same key from both streams. Similar constructs are > supported in beam, spark and flink. > When called on a Stream of (K, V) and (K, W) pairs, return a new Stream of > (K, Seq[V], Seq[W]) tuples > See also - https://cloud.google.com/dataflow/model/group-by-key -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (STORM-2258) Streams api - support CoGroupByKey
[ https://issues.apache.org/jira/browse/STORM-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064194#comment-16064194 ] Arun Mahadevan commented on STORM-2258: --- We already have a groupByKey implementation. What we are looking for is a coGroupByKey which is more like a join but instead of returning the cross product of the matching keys, groups together values for the same key from the joined streams. For example, Say stream1 has values - (k1, v1), (k2, v2), (k2, v3) and stream2 has values - (k1, x1), (k1, x2), (k3, x3) The the co-grouped stream would contain - {noformat} (k1, ([v1], [x1, x2]), (k2, ([v2, v3], [])), (k3, ([], [x3])) {noformat} Since you are co-grouping two streams containing key-value pairs, you would define the operation on the PairStream class, with the signature of the method to be something like, {noformat} public PairStream, Iterable>> coGroupByKey(PairStream otherStream) { // ... } {noformat} To get some hints on how to go about implementing this, you can take a look at the implementation of the join operation. (see JoinProcessor.java) > Streams api - support CoGroupByKey > -- > > Key: STORM-2258 > URL: https://issues.apache.org/jira/browse/STORM-2258 > Project: Apache Storm > Issue Type: Sub-task >Reporter: Arun Mahadevan > > Group together values with same key from both streams. Similar constructs are > supported in beam, spark and flink. > When called on a Stream of (K, V) and (K, W) pairs, return a new Stream of > (K, Seq[V], Seq[W]) tuples > See also - https://cloud.google.com/dataflow/model/group-by-key -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (STORM-2258) Streams api - support CoGroupByKey
[ https://issues.apache.org/jira/browse/STORM-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063568#comment-16063568 ] Kamal commented on STORM-2258: -- I am considering to work on this task.Could you please elaborate requirement a bit more. Do we need to add a new method (say groupByKey) in Stream class which groups values for the same key. > Streams api - support CoGroupByKey > -- > > Key: STORM-2258 > URL: https://issues.apache.org/jira/browse/STORM-2258 > Project: Apache Storm > Issue Type: Sub-task >Reporter: Arun Mahadevan > > Group together values with same key from both streams. Similar constructs are > supported in beam, spark and flink. > When called on a Stream of (K, V) and (K, W) pairs, return a new Stream of > (K, Seq[V], Seq[W]) tuples > See also - https://cloud.google.com/dataflow/model/group-by-key -- This message was sent by Atlassian JIRA (v6.4.14#64029)