[jira] [Commented] (STORM-2258) Streams api - support CoGroupByKey

2017-08-15 Thread Kamal (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127614#comment-16127614
 ] 

Kamal commented on STORM-2258:
--

Modification done as per your review comments. Please take a look.

> Streams api - support CoGroupByKey
> --
>
> Key: STORM-2258
> URL: https://issues.apache.org/jira/browse/STORM-2258
> Project: Apache Storm
>  Issue Type: Sub-task
>Reporter: Arun Mahadevan
>
> Group together values with same key from both streams. Similar constructs are 
> supported in beam, spark and flink.
> When called on a Stream of (K, V) and (K, W) pairs, return a new Stream of 
> (K, Seq[V], Seq[W]) tuples
> See also - https://cloud.google.com/dataflow/model/group-by-key



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (STORM-2258) Streams api - support CoGroupByKey

2017-08-06 Thread Kamal (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115915#comment-16115915
 ] 

Kamal commented on STORM-2258:
--

Thanks for reviewing the PR. Modified the PR as per your comments. Please 
review the next iteration.

> Streams api - support CoGroupByKey
> --
>
> Key: STORM-2258
> URL: https://issues.apache.org/jira/browse/STORM-2258
> Project: Apache Storm
>  Issue Type: Sub-task
>Reporter: Arun Mahadevan
>
> Group together values with same key from both streams. Similar constructs are 
> supported in beam, spark and flink.
> When called on a Stream of (K, V) and (K, W) pairs, return a new Stream of 
> (K, Seq[V], Seq[W]) tuples
> See also - https://cloud.google.com/dataflow/model/group-by-key



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (STORM-2258) Streams api - support CoGroupByKey

2017-07-31 Thread Kamal (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16107432#comment-16107432
 ] 

Kamal commented on STORM-2258:
--

Please review the patch
https://github.com/apache/storm/pull/2233



> Streams api - support CoGroupByKey
> --
>
> Key: STORM-2258
> URL: https://issues.apache.org/jira/browse/STORM-2258
> Project: Apache Storm
>  Issue Type: Sub-task
>Reporter: Arun Mahadevan
>
> Group together values with same key from both streams. Similar constructs are 
> supported in beam, spark and flink.
> When called on a Stream of (K, V) and (K, W) pairs, return a new Stream of 
> (K, Seq[V], Seq[W]) tuples
> See also - https://cloud.google.com/dataflow/model/group-by-key



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (STORM-2258) Streams api - support CoGroupByKey

2017-06-26 Thread Arun Mahadevan (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064194#comment-16064194
 ] 

Arun Mahadevan commented on STORM-2258:
---

We already have a groupByKey implementation. What we are looking for is a 
coGroupByKey which is more like a join but instead of returning the cross 
product of the matching keys, groups together values for the same key from the 
joined streams. For example,

Say stream1 has values - (k1, v1), (k2, v2), (k2, v3)  
and stream2 has values - (k1, x1), (k1, x2), (k3, x3) 

The the co-grouped stream would contain - 

{noformat}
(k1, ([v1], [x1, x2]), (k2, ([v2, v3], [])), (k3, ([], [x3]))
{noformat}

Since you are co-grouping two streams containing key-value pairs, you would 
define the operation on the PairStream class, with the signature of the method 
to be something like,

{noformat}
public  PairStream, Iterable>> 
coGroupByKey(PairStream otherStream) {
// ...
}
{noformat}

To get some hints on how to go about implementing this, you can take a look at 
the implementation of the join operation. (see JoinProcessor.java)



> Streams api - support CoGroupByKey
> --
>
> Key: STORM-2258
> URL: https://issues.apache.org/jira/browse/STORM-2258
> Project: Apache Storm
>  Issue Type: Sub-task
>Reporter: Arun Mahadevan
>
> Group together values with same key from both streams. Similar constructs are 
> supported in beam, spark and flink.
> When called on a Stream of (K, V) and (K, W) pairs, return a new Stream of 
> (K, Seq[V], Seq[W]) tuples
> See also - https://cloud.google.com/dataflow/model/group-by-key



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (STORM-2258) Streams api - support CoGroupByKey

2017-06-26 Thread Kamal (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063568#comment-16063568
 ] 

Kamal commented on STORM-2258:
--

I am considering to work on this task.Could you please elaborate requirement a 
bit more.
Do we need to add a new method (say groupByKey) in Stream class which groups 
values for the same key.


> Streams api - support CoGroupByKey
> --
>
> Key: STORM-2258
> URL: https://issues.apache.org/jira/browse/STORM-2258
> Project: Apache Storm
>  Issue Type: Sub-task
>Reporter: Arun Mahadevan
>
> Group together values with same key from both streams. Similar constructs are 
> supported in beam, spark and flink.
> When called on a Stream of (K, V) and (K, W) pairs, return a new Stream of 
> (K, Seq[V], Seq[W]) tuples
> See also - https://cloud.google.com/dataflow/model/group-by-key



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)