Alec, For this one, I'd recommend redis hll like Gna explained earlier. On 21 Aug 2014 23:31, "Sa Li" <[email protected]> wrote:
> Thanks all the reply > > I have considered to integrate the java-hll package ( > https://github.com/aggregateknowledge/java-hll), which uses hash-function > murmur_23 from google, I am having lot of exceptions to include it, I am > thinking if this hash is compatible with the distributed machnism of storm > (I might be naive). > > Another thing I am thinking is to use TridentReach, this is to count the > unique people exposed to a url page, I am thinking to combine this > tridentReach with kafkaSpout, my question, should I create a fixed size > Hashmap to contain the URL and array of visitors? So this means the fixed > size of hash map represents the window size of slide window. I wonder if > this is correct? > > > thanks > > Alec > > On Aug 21, 2014, at 11:18 AM, Nima Movafaghrad < > [email protected]> wrote: > > Alec, > > You can use something like HyperLogLog or Bloomfilters to do Unique and/or > Distinct counting. Just create a bolt that does that. > > Nima > > *From:* Sa Li [mailto:[email protected] <[email protected]>] > *Sent:* Wednesday, August 20, 2014 2:45 PM > *To:* [email protected] > *Subject:* distinct counting > > Hi, all > > I know storm does good job on counting and other aggregate jobs, I wonder > if anyone ever did distinct counting in storm, and how would you set the > time sliding window? > > thanks > > > Alec > > >
