Re: Proposal : An extension for sketch-based statistics

2017-08-16 Thread Arnaud Fournier
Thanks to bring these subjects in the discussio Ismaël.

For the second point about the standard deviation, I just want to add that
this could also be added to the distribution metric.
Actually I think this makes much more sense than just add a new transform
for this (we can also do both).

Indeed, we just need to keep track of the sum of squared elements in the
DistributionData.
Then the standard deviation can be simply computed inside a method like for
the mean in the DistributionResult.

I could take care of this.

What do you think about this?

2017-08-14 15:15 GMT+02:00 Ismaël Mejía :

> Kenneth’s idea of using sketches for state with the State API is
> really interesting, it really opens some interesting use cases, I
> haven’t really thought about it but I believe it is really an
> appealing use case for the sketches. Note that the origin of this work
> was in the line of statistics, in particular we were interested in
> data sketches (specially the Cardinality ones) as a ‘lightweight’ way
> to have approximate metrics.
>
> There are two pending subjects to discuss:
>
> 1. Having sketches as approximate metrics seems interesting, however
> the current Beam Metrics API does not allow User-Defined Metrics. I
> don’t really know the details of the current metrics implementation.
> It is eventually possibly to support this? I mean to extend metrics to
> reuse something like the sketches extension?
>
> 2. There is also another contribution that Arnaud did in case there is
> interest, it is just a transform for standard deviation. We decided
> not to include it as part of the sketches extension since it was not
> consistent with the approximate nature of the extension, but I think
> it could be another interesting contribution as a subsequent PR (if
> there is interest also on this).
>
> Regards,
> Ismaël
>
> On Sat, Aug 12, 2017 at 11:20 AM, Arnaud Fournier
>  wrote:
> > Hello Kenneth, thank you for your answer.
> >
> > I read your blog post about stateful processing and that is indeed a
> great
> > feature !
> >
> > So if I understood correctly we could use the combineFns to declare
> > combiningStates so it can be used while processing elements in a DoFn.
> That
> > opens up a lot more use cases for the sketches !
> >
> > Actually this was already possible for 2 sketches but now I refined the
> > constructors of the 2 other sketches, and will do so for the other ones
> to
> > come.
> >
> >
> > Regards,
> >
> > Arnaud
> >
> > 2017-08-08 2:07 GMT+02:00 Kenneth Knowles :
> >
> >> This is a great development! I have wanted Beam to have a library of
> >> sketches.
> >>
> >> What Eugene is referring to is the fact that you can write
> >> Combine.perKey(combineFn) to use these in a transform but also
> >> StateSpecs.combiningState(combineFn) to use them in a stateful ParDo.
> So
> >> it
> >> is good to make the CombineFn public and refine their constructors to be
> >> user-friendly.
> >>
> >> Kenn
> >>
> >> On Fri, Aug 4, 2017 at 7:45 AM, Arnaud Fournier <
> >> arnaudfournier...@gmail.com
> >> > wrote:
> >>
> >> > Thanks for your comments, that is very encouraging !
> >> >
> >> > I have created a Jira : https://issues.apache.org/jira
> /browse/BEAM-2728
> >> > and a PR : https://github.com/apache/beam/pull/3686
> >> >
> >> > Eugene and Lucas I saw that you already have some ideas so I put you
> as
> >> > reviewers,
> >> > I look forward to hear more from you.
> >> >
> >> > With Ismael and JB, we already thought about using some of these
> >> indicators
> >> > as metric cells,
> >> > as it can be useful for some kinds of monitoring.
> >> > But I have never heard about state cells, is it something like the
> >> > QuantileState in ApproximateQuantiles ?
> >> >
> >> >
> >> >
> >> > 2017-08-04 3:14 GMT+02:00 Anand Iyer :
> >> >
> >> > > This is awesome!! Very exciting to see the addition of statistical
> and
> >> > > data-mining algorithms to Apache Beam.
> >> > >
> >> > > On Thu, Aug 3, 2017 at 2:32 PM, Eugene Kirpichov <
> >> > > kirpic...@google.com.invalid> wrote:
> >> > >
> >> > > > +1, Very exciting! I have some suggestions on the exact API to
> expose
> >> > > (e.g.
> >> > > > I think it makes sense to expose the CombineFn's directly, so that
> >> they
> >> > > can
> >> > > > also be used for combining state cells and not just as
> PTransforms),
> >> > but
> >> > > > that can be handled during regular code review.
> >> > > >
> >> > > > On Thu, Aug 3, 2017 at 2:23 PM Sourabh Bajaj
> >> > > >  wrote:
> >> > > >
> >> > > > > +1 to this.
> >> > > > >
> >> > > > > On Thu, Aug 3, 2017 at 6:28 AM Lukasz Cwik
> >>  >> > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > I'm most interested in the frequency / cardinality tools as it
> >> > could
> >> > > be
> >> > > > > > used to help improve performance automatically for combiners
> by
> >> > > > detecting
> >> 

Re: Proposal : An extension for sketch-based statistics

2017-08-14 Thread Ismaël Mejía
Kenneth’s idea of using sketches for state with the State API is
really interesting, it really opens some interesting use cases, I
haven’t really thought about it but I believe it is really an
appealing use case for the sketches. Note that the origin of this work
was in the line of statistics, in particular we were interested in
data sketches (specially the Cardinality ones) as a ‘lightweight’ way
to have approximate metrics.

There are two pending subjects to discuss:

1. Having sketches as approximate metrics seems interesting, however
the current Beam Metrics API does not allow User-Defined Metrics. I
don’t really know the details of the current metrics implementation.
It is eventually possibly to support this? I mean to extend metrics to
reuse something like the sketches extension?

2. There is also another contribution that Arnaud did in case there is
interest, it is just a transform for standard deviation. We decided
not to include it as part of the sketches extension since it was not
consistent with the approximate nature of the extension, but I think
it could be another interesting contribution as a subsequent PR (if
there is interest also on this).

Regards,
Ismaël

On Sat, Aug 12, 2017 at 11:20 AM, Arnaud Fournier
 wrote:
> Hello Kenneth, thank you for your answer.
>
> I read your blog post about stateful processing and that is indeed a great
> feature !
>
> So if I understood correctly we could use the combineFns to declare
> combiningStates so it can be used while processing elements in a DoFn. That
> opens up a lot more use cases for the sketches !
>
> Actually this was already possible for 2 sketches but now I refined the
> constructors of the 2 other sketches, and will do so for the other ones to
> come.
>
>
> Regards,
>
> Arnaud
>
> 2017-08-08 2:07 GMT+02:00 Kenneth Knowles :
>
>> This is a great development! I have wanted Beam to have a library of
>> sketches.
>>
>> What Eugene is referring to is the fact that you can write
>> Combine.perKey(combineFn) to use these in a transform but also
>> StateSpecs.combiningState(combineFn) to use them in a stateful ParDo. So
>> it
>> is good to make the CombineFn public and refine their constructors to be
>> user-friendly.
>>
>> Kenn
>>
>> On Fri, Aug 4, 2017 at 7:45 AM, Arnaud Fournier <
>> arnaudfournier...@gmail.com
>> > wrote:
>>
>> > Thanks for your comments, that is very encouraging !
>> >
>> > I have created a Jira : https://issues.apache.org/jira/browse/BEAM-2728
>> > and a PR : https://github.com/apache/beam/pull/3686
>> >
>> > Eugene and Lucas I saw that you already have some ideas so I put you as
>> > reviewers,
>> > I look forward to hear more from you.
>> >
>> > With Ismael and JB, we already thought about using some of these
>> indicators
>> > as metric cells,
>> > as it can be useful for some kinds of monitoring.
>> > But I have never heard about state cells, is it something like the
>> > QuantileState in ApproximateQuantiles ?
>> >
>> >
>> >
>> > 2017-08-04 3:14 GMT+02:00 Anand Iyer :
>> >
>> > > This is awesome!! Very exciting to see the addition of statistical and
>> > > data-mining algorithms to Apache Beam.
>> > >
>> > > On Thu, Aug 3, 2017 at 2:32 PM, Eugene Kirpichov <
>> > > kirpic...@google.com.invalid> wrote:
>> > >
>> > > > +1, Very exciting! I have some suggestions on the exact API to expose
>> > > (e.g.
>> > > > I think it makes sense to expose the CombineFn's directly, so that
>> they
>> > > can
>> > > > also be used for combining state cells and not just as PTransforms),
>> > but
>> > > > that can be handled during regular code review.
>> > > >
>> > > > On Thu, Aug 3, 2017 at 2:23 PM Sourabh Bajaj
>> > > >  wrote:
>> > > >
>> > > > > +1 to this.
>> > > > >
>> > > > > On Thu, Aug 3, 2017 at 6:28 AM Lukasz Cwik
>> > > >
>> > > > > wrote:
>> > > > >
>> > > > > > I'm most interested in the frequency / cardinality tools as it
>> > could
>> > > be
>> > > > > > used to help improve performance automatically for combiners by
>> > > > detecting
>> > > > > > the few keys case or automatically handle hot keys without
>> needing
>> > > > users
>> > > > > to
>> > > > > > specify the hints when they use a combiner.
>> > > > > >
>> > > > > > On Thu, Aug 3, 2017 at 5:35 AM, Jean-Baptiste Onofré <
>> > > j...@nanthrax.net>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Nice work Arnaud ;)
>> > > > > > >
>> > > > > > > Happy to have been able to help.
>> > > > > > >
>> > > > > > > Let's see what the others will think about this.
>> > > > > > >
>> > > > > > > Regards
>> > > > > > > JB
>> > > > > > >
>> > > > > > >
>> > > > > > > On 08/03/2017 02:32 PM, Arnaud Fournier wrote:
>> > > > > > >
>> > > > > > >> Hello everyone,
>> > > > > > >>
>> > > > > > >> My name is Arnaud Fournier and I am a CS student. I am
>> currently
>> > > > doing
>> > > > > > an
>> > > > > > >> internship at Talend.
>> > > > > > >>

Re: Proposal : An extension for sketch-based statistics

2017-08-12 Thread Arnaud Fournier
Hello Kenneth, thank you for your answer.

I read your blog post about stateful processing and that is indeed a great
feature !

So if I understood correctly we could use the combineFns to declare
combiningStates so it can be used while processing elements in a DoFn. That
opens up a lot more use cases for the sketches !

Actually this was already possible for 2 sketches but now I refined the
constructors of the 2 other sketches, and will do so for the other ones to
come.


Regards,

Arnaud

2017-08-08 2:07 GMT+02:00 Kenneth Knowles :

> This is a great development! I have wanted Beam to have a library of
> sketches.
>
> What Eugene is referring to is the fact that you can write
> Combine.perKey(combineFn) to use these in a transform but also
> StateSpecs.combiningState(combineFn) to use them in a stateful ParDo. So
> it
> is good to make the CombineFn public and refine their constructors to be
> user-friendly.
>
> Kenn
>
> On Fri, Aug 4, 2017 at 7:45 AM, Arnaud Fournier <
> arnaudfournier...@gmail.com
> > wrote:
>
> > Thanks for your comments, that is very encouraging !
> >
> > I have created a Jira : https://issues.apache.org/jira/browse/BEAM-2728
> > and a PR : https://github.com/apache/beam/pull/3686
> >
> > Eugene and Lucas I saw that you already have some ideas so I put you as
> > reviewers,
> > I look forward to hear more from you.
> >
> > With Ismael and JB, we already thought about using some of these
> indicators
> > as metric cells,
> > as it can be useful for some kinds of monitoring.
> > But I have never heard about state cells, is it something like the
> > QuantileState in ApproximateQuantiles ?
> >
> >
> >
> > 2017-08-04 3:14 GMT+02:00 Anand Iyer :
> >
> > > This is awesome!! Very exciting to see the addition of statistical and
> > > data-mining algorithms to Apache Beam.
> > >
> > > On Thu, Aug 3, 2017 at 2:32 PM, Eugene Kirpichov <
> > > kirpic...@google.com.invalid> wrote:
> > >
> > > > +1, Very exciting! I have some suggestions on the exact API to expose
> > > (e.g.
> > > > I think it makes sense to expose the CombineFn's directly, so that
> they
> > > can
> > > > also be used for combining state cells and not just as PTransforms),
> > but
> > > > that can be handled during regular code review.
> > > >
> > > > On Thu, Aug 3, 2017 at 2:23 PM Sourabh Bajaj
> > > >  wrote:
> > > >
> > > > > +1 to this.
> > > > >
> > > > > On Thu, Aug 3, 2017 at 6:28 AM Lukasz Cwik
>  > >
> > > > > wrote:
> > > > >
> > > > > > I'm most interested in the frequency / cardinality tools as it
> > could
> > > be
> > > > > > used to help improve performance automatically for combiners by
> > > > detecting
> > > > > > the few keys case or automatically handle hot keys without
> needing
> > > > users
> > > > > to
> > > > > > specify the hints when they use a combiner.
> > > > > >
> > > > > > On Thu, Aug 3, 2017 at 5:35 AM, Jean-Baptiste Onofré <
> > > j...@nanthrax.net>
> > > > > > wrote:
> > > > > >
> > > > > > > Nice work Arnaud ;)
> > > > > > >
> > > > > > > Happy to have been able to help.
> > > > > > >
> > > > > > > Let's see what the others will think about this.
> > > > > > >
> > > > > > > Regards
> > > > > > > JB
> > > > > > >
> > > > > > >
> > > > > > > On 08/03/2017 02:32 PM, Arnaud Fournier wrote:
> > > > > > >
> > > > > > >> Hello everyone,
> > > > > > >>
> > > > > > >> My name is Arnaud Fournier and I am a CS student. I am
> currently
> > > > doing
> > > > > > an
> > > > > > >> internship at Talend.
> > > > > > >>
> > > > > > >> With the support of Jean-Baptiste Onofre and Ismaël Mejia, I
> > have
> > > > been
> > > > > > >> working on statistical analysis of streams with Beam, using
> > > > > > probabilistic
> > > > > > >> data structures like HyperLogLog.
> > > > > > >>
> > > > > > >> I would like to share this work with the community, but I
> wanted
> > > > first
> > > > > > to
> > > > > > >> show you my work in progress and ask you if this humble
> > > contribution
> > > > > > could
> > > > > > >> be interesting as an extension.
> > > > > > >>
> > > > > > >> I have made a little doc with more details about what I have
> > done
> > > in
> > > > > > case
> > > > > > >> you are interested and want to give me some feedback :
> > > > > > >> *https://docs.google.com/document/d/1Xy6g5RPBYX_HadpIr_2WrUe
> > > > > > >> usiwL0Jo2ACI5PEOP1kc/edit*
> > > > > > >>  > > > > > >> usiwL0Jo2ACI5PEOP1kc/edit>
> > > > > > >>
> > > > > > >> You can also find the current work implementation in progress
> > here
> > > > :
> > > > > > >>
> > > > > > >> https://github.com/ArnaudFnr/beam/tree/sketching/sdks/java/e
> > > > > > >> xtensions/sketching
> > > > > > >>
> > > > > > >>
> > > > > > >>  > > > > > >> extensions/sketching>
> > > > > > >>
> > > > > > >> Thanks !
> > > > > > >>
> > > > > > >> 

Re: Proposal : An extension for sketch-based statistics

2017-08-07 Thread Kenneth Knowles
This is a great development! I have wanted Beam to have a library of
sketches.

What Eugene is referring to is the fact that you can write
Combine.perKey(combineFn) to use these in a transform but also
StateSpecs.combiningState(combineFn) to use them in a stateful ParDo. So it
is good to make the CombineFn public and refine their constructors to be
user-friendly.

Kenn

On Fri, Aug 4, 2017 at 7:45 AM, Arnaud Fournier  wrote:

> Thanks for your comments, that is very encouraging !
>
> I have created a Jira : https://issues.apache.org/jira/browse/BEAM-2728
> and a PR : https://github.com/apache/beam/pull/3686
>
> Eugene and Lucas I saw that you already have some ideas so I put you as
> reviewers,
> I look forward to hear more from you.
>
> With Ismael and JB, we already thought about using some of these indicators
> as metric cells,
> as it can be useful for some kinds of monitoring.
> But I have never heard about state cells, is it something like the
> QuantileState in ApproximateQuantiles ?
>
>
>
> 2017-08-04 3:14 GMT+02:00 Anand Iyer :
>
> > This is awesome!! Very exciting to see the addition of statistical and
> > data-mining algorithms to Apache Beam.
> >
> > On Thu, Aug 3, 2017 at 2:32 PM, Eugene Kirpichov <
> > kirpic...@google.com.invalid> wrote:
> >
> > > +1, Very exciting! I have some suggestions on the exact API to expose
> > (e.g.
> > > I think it makes sense to expose the CombineFn's directly, so that they
> > can
> > > also be used for combining state cells and not just as PTransforms),
> but
> > > that can be handled during regular code review.
> > >
> > > On Thu, Aug 3, 2017 at 2:23 PM Sourabh Bajaj
> > >  wrote:
> > >
> > > > +1 to this.
> > > >
> > > > On Thu, Aug 3, 2017 at 6:28 AM Lukasz Cwik  >
> > > > wrote:
> > > >
> > > > > I'm most interested in the frequency / cardinality tools as it
> could
> > be
> > > > > used to help improve performance automatically for combiners by
> > > detecting
> > > > > the few keys case or automatically handle hot keys without needing
> > > users
> > > > to
> > > > > specify the hints when they use a combiner.
> > > > >
> > > > > On Thu, Aug 3, 2017 at 5:35 AM, Jean-Baptiste Onofré <
> > j...@nanthrax.net>
> > > > > wrote:
> > > > >
> > > > > > Nice work Arnaud ;)
> > > > > >
> > > > > > Happy to have been able to help.
> > > > > >
> > > > > > Let's see what the others will think about this.
> > > > > >
> > > > > > Regards
> > > > > > JB
> > > > > >
> > > > > >
> > > > > > On 08/03/2017 02:32 PM, Arnaud Fournier wrote:
> > > > > >
> > > > > >> Hello everyone,
> > > > > >>
> > > > > >> My name is Arnaud Fournier and I am a CS student. I am currently
> > > doing
> > > > > an
> > > > > >> internship at Talend.
> > > > > >>
> > > > > >> With the support of Jean-Baptiste Onofre and Ismaël Mejia, I
> have
> > > been
> > > > > >> working on statistical analysis of streams with Beam, using
> > > > > probabilistic
> > > > > >> data structures like HyperLogLog.
> > > > > >>
> > > > > >> I would like to share this work with the community, but I wanted
> > > first
> > > > > to
> > > > > >> show you my work in progress and ask you if this humble
> > contribution
> > > > > could
> > > > > >> be interesting as an extension.
> > > > > >>
> > > > > >> I have made a little doc with more details about what I have
> done
> > in
> > > > > case
> > > > > >> you are interested and want to give me some feedback :
> > > > > >> *https://docs.google.com/document/d/1Xy6g5RPBYX_HadpIr_2WrUe
> > > > > >> usiwL0Jo2ACI5PEOP1kc/edit*
> > > > > >>  > > > > >> usiwL0Jo2ACI5PEOP1kc/edit>
> > > > > >>
> > > > > >> You can also find the current work implementation in progress
> here
> > > :
> > > > > >>
> > > > > >> https://github.com/ArnaudFnr/beam/tree/sketching/sdks/java/e
> > > > > >> xtensions/sketching
> > > > > >>
> > > > > >>
> > > > > >>  > > > > >> extensions/sketching>
> > > > > >>
> > > > > >> Thanks !
> > > > > >>
> > > > > >> Arnaud
> > > > > >>
> > > > > >>
> > > > > > --
> > > > > > Jean-Baptiste Onofré
> > > > > > jbono...@apache.org
> > > > > > http://blog.nanthrax.net
> > > > > > Talend - http://www.talend.com
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Proposal : An extension for sketch-based statistics

2017-08-04 Thread Arnaud Fournier
Thanks for your comments, that is very encouraging !

I have created a Jira : https://issues.apache.org/jira/browse/BEAM-2728
and a PR : https://github.com/apache/beam/pull/3686

Eugene and Lucas I saw that you already have some ideas so I put you as
reviewers,
I look forward to hear more from you.

With Ismael and JB, we already thought about using some of these indicators
as metric cells,
as it can be useful for some kinds of monitoring.
But I have never heard about state cells, is it something like the
QuantileState in ApproximateQuantiles ?



2017-08-04 3:14 GMT+02:00 Anand Iyer :

> This is awesome!! Very exciting to see the addition of statistical and
> data-mining algorithms to Apache Beam.
>
> On Thu, Aug 3, 2017 at 2:32 PM, Eugene Kirpichov <
> kirpic...@google.com.invalid> wrote:
>
> > +1, Very exciting! I have some suggestions on the exact API to expose
> (e.g.
> > I think it makes sense to expose the CombineFn's directly, so that they
> can
> > also be used for combining state cells and not just as PTransforms), but
> > that can be handled during regular code review.
> >
> > On Thu, Aug 3, 2017 at 2:23 PM Sourabh Bajaj
> >  wrote:
> >
> > > +1 to this.
> > >
> > > On Thu, Aug 3, 2017 at 6:28 AM Lukasz Cwik 
> > > wrote:
> > >
> > > > I'm most interested in the frequency / cardinality tools as it could
> be
> > > > used to help improve performance automatically for combiners by
> > detecting
> > > > the few keys case or automatically handle hot keys without needing
> > users
> > > to
> > > > specify the hints when they use a combiner.
> > > >
> > > > On Thu, Aug 3, 2017 at 5:35 AM, Jean-Baptiste Onofré <
> j...@nanthrax.net>
> > > > wrote:
> > > >
> > > > > Nice work Arnaud ;)
> > > > >
> > > > > Happy to have been able to help.
> > > > >
> > > > > Let's see what the others will think about this.
> > > > >
> > > > > Regards
> > > > > JB
> > > > >
> > > > >
> > > > > On 08/03/2017 02:32 PM, Arnaud Fournier wrote:
> > > > >
> > > > >> Hello everyone,
> > > > >>
> > > > >> My name is Arnaud Fournier and I am a CS student. I am currently
> > doing
> > > > an
> > > > >> internship at Talend.
> > > > >>
> > > > >> With the support of Jean-Baptiste Onofre and Ismaël Mejia, I have
> > been
> > > > >> working on statistical analysis of streams with Beam, using
> > > > probabilistic
> > > > >> data structures like HyperLogLog.
> > > > >>
> > > > >> I would like to share this work with the community, but I wanted
> > first
> > > > to
> > > > >> show you my work in progress and ask you if this humble
> contribution
> > > > could
> > > > >> be interesting as an extension.
> > > > >>
> > > > >> I have made a little doc with more details about what I have done
> in
> > > > case
> > > > >> you are interested and want to give me some feedback :
> > > > >> *https://docs.google.com/document/d/1Xy6g5RPBYX_HadpIr_2WrUe
> > > > >> usiwL0Jo2ACI5PEOP1kc/edit*
> > > > >>  > > > >> usiwL0Jo2ACI5PEOP1kc/edit>
> > > > >>
> > > > >> You can also find the current work implementation in progress here
> > :
> > > > >>
> > > > >> https://github.com/ArnaudFnr/beam/tree/sketching/sdks/java/e
> > > > >> xtensions/sketching
> > > > >>
> > > > >>
> > > > >>  > > > >> extensions/sketching>
> > > > >>
> > > > >> Thanks !
> > > > >>
> > > > >> Arnaud
> > > > >>
> > > > >>
> > > > > --
> > > > > Jean-Baptiste Onofré
> > > > > jbono...@apache.org
> > > > > http://blog.nanthrax.net
> > > > > Talend - http://www.talend.com
> > > > >
> > > >
> > >
> >
>


Re: Proposal : An extension for sketch-based statistics

2017-08-03 Thread Anand Iyer
This is awesome!! Very exciting to see the addition of statistical and
data-mining algorithms to Apache Beam.

On Thu, Aug 3, 2017 at 2:32 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:

> +1, Very exciting! I have some suggestions on the exact API to expose (e.g.
> I think it makes sense to expose the CombineFn's directly, so that they can
> also be used for combining state cells and not just as PTransforms), but
> that can be handled during regular code review.
>
> On Thu, Aug 3, 2017 at 2:23 PM Sourabh Bajaj
>  wrote:
>
> > +1 to this.
> >
> > On Thu, Aug 3, 2017 at 6:28 AM Lukasz Cwik 
> > wrote:
> >
> > > I'm most interested in the frequency / cardinality tools as it could be
> > > used to help improve performance automatically for combiners by
> detecting
> > > the few keys case or automatically handle hot keys without needing
> users
> > to
> > > specify the hints when they use a combiner.
> > >
> > > On Thu, Aug 3, 2017 at 5:35 AM, Jean-Baptiste Onofré 
> > > wrote:
> > >
> > > > Nice work Arnaud ;)
> > > >
> > > > Happy to have been able to help.
> > > >
> > > > Let's see what the others will think about this.
> > > >
> > > > Regards
> > > > JB
> > > >
> > > >
> > > > On 08/03/2017 02:32 PM, Arnaud Fournier wrote:
> > > >
> > > >> Hello everyone,
> > > >>
> > > >> My name is Arnaud Fournier and I am a CS student. I am currently
> doing
> > > an
> > > >> internship at Talend.
> > > >>
> > > >> With the support of Jean-Baptiste Onofre and Ismaël Mejia, I have
> been
> > > >> working on statistical analysis of streams with Beam, using
> > > probabilistic
> > > >> data structures like HyperLogLog.
> > > >>
> > > >> I would like to share this work with the community, but I wanted
> first
> > > to
> > > >> show you my work in progress and ask you if this humble contribution
> > > could
> > > >> be interesting as an extension.
> > > >>
> > > >> I have made a little doc with more details about what I have done in
> > > case
> > > >> you are interested and want to give me some feedback :
> > > >> *https://docs.google.com/document/d/1Xy6g5RPBYX_HadpIr_2WrUe
> > > >> usiwL0Jo2ACI5PEOP1kc/edit*
> > > >>  > > >> usiwL0Jo2ACI5PEOP1kc/edit>
> > > >>
> > > >> You can also find the current work implementation in progress here
> :
> > > >>
> > > >> https://github.com/ArnaudFnr/beam/tree/sketching/sdks/java/e
> > > >> xtensions/sketching
> > > >>
> > > >>
> > > >>  > > >> extensions/sketching>
> > > >>
> > > >> Thanks !
> > > >>
> > > >> Arnaud
> > > >>
> > > >>
> > > > --
> > > > Jean-Baptiste Onofré
> > > > jbono...@apache.org
> > > > http://blog.nanthrax.net
> > > > Talend - http://www.talend.com
> > > >
> > >
> >
>


Re: Proposal : An extension for sketch-based statistics

2017-08-03 Thread Lukasz Cwik
I'm most interested in the frequency / cardinality tools as it could be
used to help improve performance automatically for combiners by detecting
the few keys case or automatically handle hot keys without needing users to
specify the hints when they use a combiner.

On Thu, Aug 3, 2017 at 5:35 AM, Jean-Baptiste Onofré 
wrote:

> Nice work Arnaud ;)
>
> Happy to have been able to help.
>
> Let's see what the others will think about this.
>
> Regards
> JB
>
>
> On 08/03/2017 02:32 PM, Arnaud Fournier wrote:
>
>> Hello everyone,
>>
>> My name is Arnaud Fournier and I am a CS student. I am currently doing an
>> internship at Talend.
>>
>> With the support of Jean-Baptiste Onofre and Ismaël Mejia, I have been
>> working on statistical analysis of streams with Beam, using probabilistic
>> data structures like HyperLogLog.
>>
>> I would like to share this work with the community, but I wanted first to
>> show you my work in progress and ask you if this humble contribution could
>> be interesting as an extension.
>>
>> I have made a little doc with more details about what I have done in case
>> you are interested and want to give me some feedback :
>> *https://docs.google.com/document/d/1Xy6g5RPBYX_HadpIr_2WrUe
>> usiwL0Jo2ACI5PEOP1kc/edit*
>> > usiwL0Jo2ACI5PEOP1kc/edit>
>>
>> You can also find the current work implementation in progress here  :
>>
>> https://github.com/ArnaudFnr/beam/tree/sketching/sdks/java/e
>> xtensions/sketching
>>
>>
>> > extensions/sketching>
>>
>> Thanks !
>>
>> Arnaud
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Proposal : An extension for sketch-based statistics

2017-08-03 Thread Jean-Baptiste Onofré

Nice work Arnaud ;)

Happy to have been able to help.

Let's see what the others will think about this.

Regards
JB

On 08/03/2017 02:32 PM, Arnaud Fournier wrote:

Hello everyone,

My name is Arnaud Fournier and I am a CS student. I am currently doing an
internship at Talend.

With the support of Jean-Baptiste Onofre and Ismaël Mejia, I have been
working on statistical analysis of streams with Beam, using probabilistic
data structures like HyperLogLog.

I would like to share this work with the community, but I wanted first to
show you my work in progress and ask you if this humble contribution could
be interesting as an extension.

I have made a little doc with more details about what I have done in case
you are interested and want to give me some feedback :
*https://docs.google.com/document/d/1Xy6g5RPBYX_HadpIr_2WrUeusiwL0Jo2ACI5PEOP1kc/edit*


You can also find the current work implementation in progress here  :

https://github.com/ArnaudFnr/beam/tree/sketching/sdks/java/e
xtensions/sketching




Thanks !

Arnaud



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com