Thanks all the reply I have considered to integrate the java-hll package (https://github.com/aggregateknowledge/java-hll), which uses hash-function murmur_23 from google, I am having lot of exceptions to include it, I am thinking if this hash is compatible with the distributed machnism of storm (I might be naive).
Another thing I am thinking is to use TridentReach, this is to count the unique people exposed to a url page, I am thinking to combine this tridentReach with kafkaSpout, my question, should I create a fixed size Hashmap to contain the URL and array of visitors? So this means the fixed size of hash map represents the window size of slide window. I wonder if this is correct? thanks Alec On Aug 21, 2014, at 11:18 AM, Nima Movafaghrad <[email protected]> wrote: > Alec, > > You can use something like HyperLogLog or Bloomfilters to do Unique and/or > Distinct counting. Just create a bolt that does that. > > Nima > > From: Sa Li [mailto:[email protected]] > Sent: Wednesday, August 20, 2014 2:45 PM > To: [email protected] > Subject: distinct counting > > Hi, all > > I know storm does good job on counting and other aggregate jobs, I wonder if > anyone ever did distinct counting in storm, and how would you set the time > sliding window? > > thanks > > > Alec
