Hi, I've just started using Storm and have a couple of questions. I have a stream of events that consists of a timestamp and a resource-id and I want to "bucket" them into discrete time-buckets, e.g. 1 minute long, and also group on resource-id so that even if the same resource-id is encountered multiple times during the same time bucket it is only counted as one.
I'm mapping the timestamp onto a date-string with minute granularity and groups on that, which woks fine. But I don't understand how to add the grouping on resource-id as well. For example, I want the following stream [timestamp,id]: "2014-03-20 14:18:32,887,1" "2014-03-20 14:18:42,887,2" "2014-03-20 14:18:52,887,1" "2014-03-20 14:18:57,887,1" "2014-03-20 14:18:58,887,3" "2014-03-20 14:19:07,887,1" to result in [timebucket,count]: "2014-03-20 14:18:00,3" "2014-03-20 14:19:00,1" Any ideas? I already implemented this using tick-tuples and grouping on resource-id, but I want to use Trident instead and be able to catch up properly if I restart the Storm cluster. Also, I read in several places that one can have a spout batch by "punctuation", which fits my use case well. But I haven't understood how this can be implemented. Does anybody have any pointers? Many thanks / Jonas
