I'm not too sure about how postgres hll works but i'm assuming you're going to have to send every tuple to Postgres DB remotely. This is very expensive. Where if you build your hll data strucuture in storm you only have to persist the fixed size serialized version of the hll to the database each transaction. This sort of solution scales much better.
On Fri, Aug 15, 2014 at 1:42 PM, Sa Li <[email protected]> wrote: > postgresql-hll: the PostgreSQL extension adding HyperLogLog data > structures seems pretty good, If we do counting directly in postgresDB. > > > On Fri, Aug 15, 2014 at 1:38 PM, Sa Li <[email protected]> wrote: > >> Hi, all >> >> Continue this topic, I am bit of confused whether I should implement the >> hyperloglog in storm or perform the postgresql-hll extension in postgresDB, >> if I can effectively count the uniques in postgresql-hll, and write into a >> separate distinct count table, why would I implement that in storm? I know >> some developers are implementing hll in storm, and I am just unclear what >> the advantage to do that in storm than in database with hll-extension. >> >> thanks >> >> Alec >> >> >> On Wed, Aug 13, 2014 at 4:37 PM, Sa Li <[email protected]> wrote: >> >>> Hi, All >>> >>> I am thinking to implement HyperLoglog by storm with KafkaSpout, and >>> output not only the distinct counts, but also some kind of bitmap string, >>> anyone did the similar job, a guide for start is highly appreciated. >>> >>> thanks >>> >>> Alec >>> >> >> >
