Re: [DISCUSS] Beam pipeline logical and physical DAGs visualization.

2017-08-03 Thread Robert Bradshaw
Nice. In terms of shared data structures, we have https://github.com/apache/beam/blob/master/sdks/common/runner-api/src/main/proto/beam_runner_api.proto . Presumably a utility that converts this to a dot file would be quite useful. It might be interesting to experiment with different ways of hand

Re: [DISCUSS] Beam pipeline logical and physical DAGs visualization.

2017-08-03 Thread Ahmet Altay
+1, this looks great and it will be very useful for users to understand their pipelines. On Thu, Aug 3, 2017 at 8:25 PM, Pei HE wrote: > Hi all, > While working on JStorm and MapReduce runners, I found that it is very > helpful to understand Beam pipelines by visualizing them. > > Logical graph:

[DISCUSS] Beam pipeline logical and physical DAGs visualization.

2017-08-03 Thread Pei HE
Hi all, While working on JStorm and MapReduce runners, I found that it is very helpful to understand Beam pipelines by visualizing them. Logical graph: https://drive.google.com/file/d/0B6iZ7iRh-LOYc0dUS0Rwb2tvWGM/view?usp=sharing Physical graph: https://drive.google.com/file/d/0B6iZ7iRh-LOYbDFWeD

Re: Requiring PTransform to set a coder on its resulting collections

2017-08-03 Thread Robert Bradshaw
On Thu, Aug 3, 2017 at 6:08 PM, Eugene Kirpichov wrote: > https://github.com/apache/beam/pull/3649 has landed. The main contribution > of this PR is deprecating PTransform.getDefaultOutputCoder(). > > Next steps are to get rid of all setCoder() calls in the SDK, and deprecate > setCoder(). > Nearl

Re: Proposal : An extension for sketch-based statistics

2017-08-03 Thread Anand Iyer
This is awesome!! Very exciting to see the addition of statistical and data-mining algorithms to Apache Beam. On Thu, Aug 3, 2017 at 2:32 PM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: > +1, Very exciting! I have some suggestions on the exact API to expose (e.g. > I think it makes se

Re: Requiring PTransform to set a coder on its resulting collections

2017-08-03 Thread Eugene Kirpichov
https://github.com/apache/beam/pull/3649 has landed. The main contribution of this PR is deprecating PTransform.getDefaultOutputCoder(). Next steps are to get rid of all setCoder() calls in the SDK, and deprecate setCoder(). Nearly all setCoder() calls (perhaps simply all?) I found are on the outp

Re: DSL_SQL branch API review

2017-08-03 Thread Mingmin Xu
Thank you @Tyler to gather the APIs introduced in SQL DSL, add some comments in the doc. On Thu, Aug 3, 2017 at 4:21 PM, Tyler Akidau wrote: > Hello Beam dev listers! > > TL;DR - DSL_SQL API review happening at > https://s.apache.org/beam-sql-dsl-api-review > > As one of the last steps towards m

DSL_SQL branch API review

2017-08-03 Thread Tyler Akidau
Hello Beam dev listers! TL;DR - DSL_SQL API review happening at https://s.apache.org/beam-sql-dsl-api-review As one of the last steps towards merging the DSL_SQL branch to master [1], we are now conducting a holistic API review. As part of that, I've created a document [2] that lists out the publ

Re: Proposal : An extension for sketch-based statistics

2017-08-03 Thread Eugene Kirpichov
+1, Very exciting! I have some suggestions on the exact API to expose (e.g. I think it makes sense to expose the CombineFn's directly, so that they can also be used for combining state cells and not just as PTransforms), but that can be handled during regular code review. On Thu, Aug 3, 2017 at 2:

Re: Proposal : An extension for sketch-based statistics

2017-08-03 Thread Sourabh Bajaj
+1 to this. On Thu, Aug 3, 2017 at 6:28 AM Lukasz Cwik wrote: > I'm most interested in the frequency / cardinality tools as it could be > used to help improve performance automatically for combiners by detecting > the few keys case or automatically handle hot keys without needing users to > spec

Re: Should Pipeline wait till all processing time timers fire before exit?

2017-08-03 Thread Lukasz Cwik
On Wed, Jul 26, 2017 at 12:23 PM, Robert Bradshaw < rober...@google.com.invalid> wrote: > On Wed, Jul 26, 2017 at 7:45 AM, Lukasz Cwik > wrote: > > Robert, in your case where output is being produced based upon a > heartbeat, > > either the watermark on the output went to infinity and all that da

Re: [VOTE] Release 2.1.0, release candidate #2

2017-08-03 Thread Jean-Baptiste Onofré
Another quick update. Regarding BEAM-2671, I asked help from Stas and Aviem on this one. It's our high priority as it's the main blocking issue before cutting RC3. At some point, if we are not able to move fast on this one, I would propose to cut RC3 as it is. Regards JB On 08/02/2017 08:52

Re: Proposal : An extension for sketch-based statistics

2017-08-03 Thread Lukasz Cwik
I'm most interested in the frequency / cardinality tools as it could be used to help improve performance automatically for combiners by detecting the few keys case or automatically handle hot keys without needing users to specify the hints when they use a combiner. On Thu, Aug 3, 2017 at 5:35 AM,

Re: Proposal : An extension for sketch-based statistics

2017-08-03 Thread Jean-Baptiste Onofré
Nice work Arnaud ;) Happy to have been able to help. Let's see what the others will think about this. Regards JB On 08/03/2017 02:32 PM, Arnaud Fournier wrote: Hello everyone, My name is Arnaud Fournier and I am a CS student. I am currently doing an internship at Talend. With the support of

Re: Requiring PTransform to set a coder on its resulting collections

2017-08-03 Thread Lukasz Cwik
I'm for (1) and am not sure about the feasibility of (2) without having an escape hatch that allows a pipeline author to specify a coder to handle their special case. On Tue, Aug 1, 2017 at 2:15 PM, Reuven Lax wrote: > One interesting wrinkle: I'm about to propose a set of semantics for > snapsh

Proposal : An extension for sketch-based statistics

2017-08-03 Thread Arnaud Fournier
Hello everyone, My name is Arnaud Fournier and I am a CS student. I am currently doing an internship at Talend. With the support of Jean-Baptiste Onofre and Ismaël Mejia, I have been working on statistical analysis of streams with Beam, using probabilistic data structures like HyperLogLog. I wou