Thanks Uri, I came across that and took a quick look, seems interesting.
On a related note, it would be quite cool to have a sort of port of Algebird (or at least count-min, top-k and HLL, perhaps bloom filter) to Python, that are monoid-style for us in PySpark... — Sent from Mailbox for iPhone On Sat, Feb 1, 2014 at 2:34 AM, Uri Laserson <[email protected]> wrote: > Hi everyone, > I implemented a version of distributed streaming quantiles for PySpark. It > uses a count-min sketch approach. You can find the code here: > https://github.com/laserson/dsq > Thought it might be of interest... > Uri > -- > Uri Laserson, PhD > Data Scientist, Cloudera > Twitter/GitHub: @laserson > +1 617 910 0447 > [email protected]
