Hi, I've just started using Storm and have a couple of questions.

I have a stream of events that consists of a timestamp and a resource-id
and I want to "bucket" them into discrete time-buckets, e.g. 1 minute long,
and also group on resource-id so that even if the same resource-id is
encountered multiple times during the same time bucket it is only counted
as one.

I'm mapping the timestamp onto a date-string with minute granularity and
groups on that, which woks fine. But I don't understand how to add the
grouping on resource-id as well.

For example, I want the following stream [timestamp,id]:
"2014-03-20 14:18:32,887,1"
"2014-03-20 14:18:42,887,2"
"2014-03-20 14:18:52,887,1"
"2014-03-20 14:18:57,887,1"
"2014-03-20 14:18:58,887,3"
"2014-03-20 14:19:07,887,1"

to result in [timebucket,count]:
"2014-03-20 14:18:00,3"
"2014-03-20 14:19:00,1"

Any ideas?
I already implemented this using tick-tuples and grouping on resource-id,
but I want to use Trident instead and be able to catch up properly if I
restart the Storm cluster.

Also, I read in several places that one can have a spout batch by
"punctuation", which fits my use case well. But I haven't understood how
this can be implemented. Does anybody have any pointers?


Many thanks / Jonas

Reply via email to