Thanks to bring these subjects in the discussio Ismaël.
For the second point about the standard deviation, I just want to add that
this could also be added to the distribution metric.
Actually I think this makes much more sense than just add a new transform
for this (we can also do both).
Kenneth’s idea of using sketches for state with the State API is
really interesting, it really opens some interesting use cases, I
haven’t really thought about it but I believe it is really an
appealing use case for the sketches. Note that the origin of this work
was in the line of statistics, in
Hello Kenneth, thank you for your answer.
I read your blog post about stateful processing and that is indeed a great
feature !
So if I understood correctly we could use the combineFns to declare
combiningStates so it can be used while processing elements in a DoFn. That
opens up a lot more use
This is a great development! I have wanted Beam to have a library of
sketches.
What Eugene is referring to is the fact that you can write
Combine.perKey(combineFn) to use these in a transform but also
StateSpecs.combiningState(combineFn) to use them in a stateful ParDo. So it
is good to make the
Thanks for your comments, that is very encouraging !
I have created a Jira : https://issues.apache.org/jira/browse/BEAM-2728
and a PR : https://github.com/apache/beam/pull/3686
Eugene and Lucas I saw that you already have some ideas so I put you as
reviewers,
I look forward to hear more from
This is awesome!! Very exciting to see the addition of statistical and
data-mining algorithms to Apache Beam.
On Thu, Aug 3, 2017 at 2:32 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:
> +1, Very exciting! I have some suggestions on the exact API to expose (e.g.
> I think it makes
I'm most interested in the frequency / cardinality tools as it could be
used to help improve performance automatically for combiners by detecting
the few keys case or automatically handle hot keys without needing users to
specify the hints when they use a combiner.
On Thu, Aug 3, 2017 at 5:35 AM,
Nice work Arnaud ;)
Happy to have been able to help.
Let's see what the others will think about this.
Regards
JB
On 08/03/2017 02:32 PM, Arnaud Fournier wrote:
Hello everyone,
My name is Arnaud Fournier and I am a CS student. I am currently doing an
internship at Talend.
With the support
Hello everyone,
My name is Arnaud Fournier and I am a CS student. I am currently doing an
internship at Talend.
With the support of Jean-Baptiste Onofre and Ismaël Mejia, I have been
working on statistical analysis of streams with Beam, using probabilistic
data structures like HyperLogLog.
I