Alec,

For this one, I'd recommend redis hll like Gna explained earlier.
On 21 Aug 2014 23:31, "Sa Li" <[email protected]> wrote:

> Thanks all the reply
>
> I have considered to integrate the java-hll package (
> https://github.com/aggregateknowledge/java-hll), which uses hash-function
> murmur_23 from google, I am having lot of exceptions to include it, I am
> thinking if this hash is compatible with the distributed machnism of storm
> (I might be naive).
>
> Another thing I am thinking is to use TridentReach, this is to count the
> unique people exposed to a url page, I am thinking to combine this
> tridentReach with kafkaSpout, my question, should I create a fixed size
> Hashmap to contain the URL and array of visitors? So this means the fixed
> size of hash map represents the window size of slide window. I wonder if
> this is correct?
>
>
> thanks
>
> Alec
>
> On Aug 21, 2014, at 11:18 AM, Nima Movafaghrad <
> [email protected]> wrote:
>
> Alec,
>
> You can use something like HyperLogLog or Bloomfilters to do Unique and/or
> Distinct counting. Just create a bolt that does that.
>
> Nima
>
> *From:* Sa Li [mailto:[email protected] <[email protected]>]
> *Sent:* Wednesday, August 20, 2014 2:45 PM
> *To:* [email protected]
> *Subject:* distinct counting
>
> Hi, all
>
> I know storm does good job on counting and other aggregate jobs, I wonder
> if anyone ever did distinct counting in storm, and how would you set the
> time sliding window?
>
> thanks
>
>
> Alec
>
>
>

Reply via email to