So I am trying to do the following:
stream*.*groupBy*(new* Fields*(*"key1"*))*
*.*aggregate*(new* Fields*(*"activity_a","activity_b","secondKey"*),*
*new* MySum*(),* *new* Fields*(*"count_a,count_b,second_key"*))*
* .*groupBy*(new* Fields*(*"second_key"*)).*
* .*aggregate*(new* Fields*(*"count_a","count_b"*),* *new* MyAnotherSum
*(),* *new* Fields*(*"total_count_a,total_count_b"*))*
*Does this look like the right approach?*
*Chen*
On Tue, Jan 28, 2014 at 11:55 AM, Chen Wang <[email protected]>wrote:
> Hi,
> I have a stream source emitting
> key1, activity_a, secondKey1
> key1, activity_b, secondKey2
>
> key2, activity_a, secondKey1
> key2, activity_b, secondKey2
>
> ...
>
> The second key can be null.
>
> I would like to first groupby main key to get all the count of the
> activities.
> so i will get
> key1, count_activity_a, count_activity_b, secondKey
> key2, count_activty_a, count_activty_b, secondKey
>
> Then I would like to group by the second key, ignore all the secondKey non
> existing entries, and get all the count together, so the desired result
> would be
> secondKey1, count_activity_a, count_activity_b
> secondKey2, count_activity_a, count_activity_b
>
> How can I achieve this in Trident?
> The first step is pretty easy. However, i am not sure how to do the second
> grouping, since I need to filter out the secondkey non existing entries.
> But until the first level calculation is all done, i will not know whether
> it has second key or not.
>
> Thanks much!
> Chen
>
>