Re: Cleaning up Approximate Algorithms in Beam

2020-10-13 Thread Reza Rokni
Hi, Sorry it took almost a year before we found time... https://github.com/apache/beam/pull/12973 ( Robin and *Andrea have agreed to review). * With this PR the old ApproximateUnique will be marked as deprecated. With notes to make use of ApproximateCountDistinct.java

Re: Cleaning up Approximate Algorithms in Beam

2020-02-18 Thread Reza Rokni
Hi, I will be making time for completing this in March, with completion and reviews planned for April. Cheers Reza On Wed, 27 Nov 2019 at 02:29, Robert Bradshaw wrote: > I think this thread is sufficient. > > On Mon, Nov 25, 2019 at 5:59 PM Reza Rokni wrote: > >> Hi, >> >> So do we need a vo

Re: Cleaning up Approximate Algorithms in Beam

2019-11-26 Thread Robert Bradshaw
I think this thread is sufficient. On Mon, Nov 25, 2019 at 5:59 PM Reza Rokni wrote: > Hi, > > So do we need a vote for the final list of actions? Or is this thread > enough to go ahead and raise the PR's? > > Cheers > > Reza > > On Tue, 26 Nov 2019 at 06:01, Ahmet Altay wrote: > >> >> >> On Mo

Re: Cleaning up Approximate Algorithms in Beam

2019-11-25 Thread Reza Rokni
Hi, So do we need a vote for the final list of actions? Or is this thread enough to go ahead and raise the PR's? Cheers Reza On Tue, 26 Nov 2019 at 06:01, Ahmet Altay wrote: > > > On Mon, Nov 18, 2019 at 10:57 AM Robert Bradshaw > wrote: > >> On Sun, Nov 17, 2019 at 5:16 PM Reza Rokni wrote

Re: Cleaning up Approximate Algorithms in Beam

2019-11-25 Thread Ahmet Altay
On Mon, Nov 18, 2019 at 10:57 AM Robert Bradshaw wrote: > On Sun, Nov 17, 2019 at 5:16 PM Reza Rokni wrote: > >> *Ahmet: FWIW, There is a python implementation only for this >> version: >> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/stats.py#L38 >>

Re: Cleaning up Approximate Algorithms in Beam

2019-11-18 Thread Robert Bradshaw
On Sun, Nov 17, 2019 at 5:16 PM Reza Rokni wrote: > *Ahmet: FWIW, There is a python implementation only for this > version: > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/stats.py#L38 >

Re: Cleaning up Approximate Algorithms in Beam

2019-11-17 Thread Reza Rokni
*Ahmet: FWIW, There is a python implementation only for this version: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/stats.py#L38 * Eventually we will be able to make use of

Re: Cleaning up Approximate Algorithms in Beam

2019-11-14 Thread Robert Bradshaw
On Thu, Nov 14, 2019 at 1:06 AM Kenneth Knowles wrote: > Wow. Nice summary, yes. Major calls to action: > > 0. Never allow a combiner that does not include the format of its state > clear in its name/URN. The "update compatibility" problem makes their > internal accumulator state essentially part

Re: Cleaning up Approximate Algorithms in Beam

2019-11-14 Thread Kenneth Knowles
Wow. Nice summary, yes. Major calls to action: 0. Never allow a combiner that does not include the format of its state clear in its name/URN. The "update compatibility" problem makes their internal accumulator state essentially part of their public API. Combiners named for what they do are an inhe

Re: Cleaning up Approximate Algorithms in Beam

2019-11-13 Thread Reuven Lax
On Wed, Nov 13, 2019 at 9:58 AM Ahmet Altay wrote: > Thank you for writing this summary. > > On Tue, Nov 12, 2019 at 6:35 PM Reza Rokni wrote: > >> Hi everyone; >> >> TL/DR : Discussion on Beam's various Approximate Distinct Count >> algorithms. >> >> Today there are several options for Approxim

Re: Cleaning up Approximate Algorithms in Beam

2019-11-13 Thread Ahmet Altay
Thank you for writing this summary. On Tue, Nov 12, 2019 at 6:35 PM Reza Rokni wrote: > Hi everyone; > > TL/DR : Discussion on Beam's various Approximate Distinct Count algorithms. > > Today there are several options for Approximate Algorithms in Apache Beam > 2.16 with HLLCount being the most r

Cleaning up Approximate Algorithms in Beam

2019-11-12 Thread Reza Rokni
Hi everyone; TL/DR : Discussion on Beam's various Approximate Distinct Count algorithms. Today there are several options for Approximate Algorithms in Apache Beam 2.16 with HLLCount being the most recently added. Would like to canvas opinions here on the possibility of rationalizing these API's b