Thanks for reply, Sam. I like to count the distinct users at this point, I want to be able to construct a bitmap for a time bucket, for instance 10 minutes. This bitmap will be stored in DB, so we can count the unique users at any time windows. All we need to do is to construct that bitmap in storm, we can run Hyberloglog in DB since there are open source release to use, such as postgresql-hll.
That will be great if you ‘ve already built data structure in storm, my current task is just to build the bitmap, say, receiving 100 events/sec, the data field can be timestamp and userID and event type, so the input of storm data structure can be t_start, t_end, userID, event type, the output that I will write into DB is like t1:t2 100010101001010 t2:t3 00101010100000101 I am relatively new to storm, the entire process is like this : spout getting through data from kafka published, bolt creates bitmap, and write into DB, and I believe I need to use different bolts to take on the jobs. Please correct me if this logic is incorrect, and scripting instruction is very welcome. thanks Alec On Jul 16, 2014, at 10:50 AM, Sa Li <[email protected]> wrote: > Hi, All > > I like to develop a bitmap to count uniques in bolt, the process is like > this, spout take the stream from kafka, emit to bolt, bolt will output an > online user bitmap with predefined time window. My plan is to use bitmap > structure in redis, say set bit(key, offset, value), where key is user action > and time window, offset is useriD, value is 1. I know there is a > storm-redis-pubsub checkin in storm-contrib, but never use it, i wonder if > anyone ever done or thinking to make bitmap in storm. > > thanks > > Alec.
