Re: Contributing Beam Kata (Java & Python)

2019-05-14 Thread hsuryawirawan
Thanks for merging it Reuven! Quick question, would it be useful if we write a blog post on the Kata so that we can build more awareness for people to try out? I've also uploaded the course to Stepik which has seamless integration within the IDE for people to easily start the course. On

Re: [ANNOUNCE] New PMC Member: Pablo Estrada

2019-05-14 Thread Robert Burke
Woohoo! Well deserved. On Tue, May 14, 2019, 8:34 PM Reuven Lax wrote: > Congratulations! > > *From: *Mikhail Gryzykhin > *Date: *Tue, May 14, 2019 at 8:32 PM > *To: * > > Congratulations Pablo! >> >> On Tue, May 14, 2019, 20:25 Kenneth Knowles wrote: >> >>> Hi all, >>> >>> Please join me

Re: [ANNOUNCE] New PMC Member: Pablo Estrada

2019-05-14 Thread Reuven Lax
Congratulations! *From: *Mikhail Gryzykhin *Date: *Tue, May 14, 2019 at 8:32 PM *To: * Congratulations Pablo! > > On Tue, May 14, 2019, 20:25 Kenneth Knowles wrote: > >> Hi all, >> >> Please join me and the rest of the Beam PMC in welcoming Pablo Estrada to >> join the PMC. >> >> Pablo first

Re: [ANNOUNCE] New PMC Member: Pablo Estrada

2019-05-14 Thread Mikhail Gryzykhin
Congratulations Pablo! On Tue, May 14, 2019, 20:25 Kenneth Knowles wrote: > Hi all, > > Please join me and the rest of the Beam PMC in welcoming Pablo Estrada to > join the PMC. > > Pablo first picked up BEAM-722 in October of 2016 and has been a steady > part of the Beam community since then.

Re: [ANNOUNCE] New PMC Member: Pablo Estrada

2019-05-14 Thread Boyuan Zhang
Congratulations Pablo! So well deserved! *From: *Kenneth Knowles *Date: *Tue, May 14, 2019 at 8:25 PM *To: *dev Hi all, > > Please join me and the rest of the Beam PMC in welcoming Pablo Estrada to > join the PMC. > > Pablo first picked up BEAM-722 in October of 2016 and has been a steady >

Re: [ANNOUNCE] New PMC Member: Pablo Estrada

2019-05-14 Thread Reza Rokni
Awesome news :-) *From: *Kenneth Knowles *Date: *Wed, 15 May 2019, 11:25 *To: *dev Hi all, > > Please join me and the rest of the Beam PMC in welcoming Pablo Estrada to > join the PMC. > > Pablo first picked up BEAM-722 in October of 2016 and has been a steady > part of the Beam community since

[ANNOUNCE] New PMC Member: Pablo Estrada

2019-05-14 Thread Kenneth Knowles
Hi all, Please join me and the rest of the Beam PMC in welcoming Pablo Estrada to join the PMC. Pablo first picked up BEAM-722 in October of 2016 and has been a steady part of the Beam community since then. In addition to technical work on Beam Python & Java & runners, I would highlight how

Re: SqlTransform Metadata

2019-05-14 Thread Reza Rokni
Hi, One use case would be when dealing with the windowing functions for example: SELECT f_int, COUNT(*) , TUMBLE_START(f_timestamp, INTERVAL '1' HOUR) tumble_start FROM PCOLLECTION GROUP BY f_int, TUMBLE(f_timestamp, INTERVAL '1' HOUR) For an element which is using Metadata to

Re: Schema is not final... is it allowed to override

2019-05-14 Thread Reuven Lax
*From: *Alex Van Boxel *Date: *Tue, May 14, 2019 at 3:38 PM *To: *ML Beam/Dev ProtoBuf and certainly the Descriptor is a challenging beast, and I > certainly want to support DynamicMessage (see also my ProtoCoder PR). > > Creating a schema from the proto is easy, the trick is creating the >

Re: All validates runner tests seems to be broken.

2019-05-14 Thread Michael Luckey
To clarify: These failures are expected. And not much we can do abou. It is just that I did forget about the consequences for the ongoing release. (Sorry again, Ankur!) Apart from that, testing was done on local Jenkins setup to not mess with the global configuration. As this would have blocked

Re: All validates runner tests seems to be broken.

2019-05-14 Thread Andrew Pilloud
The specific issue in your text appears to be a typo introduced in https://github.com/apache/beam/pull/8194 While that PR ran a bunch of tests, I didn't see a reference to "Run Seed Job", which means it didn't actually test the code in the change. I expect not all the failures are the same as

Re: Schema is not final... is it allowed to override

2019-05-14 Thread Alex Van Boxel
ProtoBuf and certainly the Descriptor is a challenging beast, and I certainly want to support DynamicMessage (see also my ProtoCoder PR). Creating a schema from the proto is easy, the trick is creating the to/fromRow. With precomiled proto's I can easily get the Descriptor from the class, but the

Re: [VOTE] Remove deprecated Java Reference Runner code from repository.

2019-05-14 Thread Ruoyun Huang
+1 *From: *Daniel Oliveira *Date: *Tue, May 14, 2019 at 2:19 PM *To: *dev Hello everyone, > > I'm calling for a vote on removing the deprecated Java Reference Runner > code. The PR for the change has already been tested and reviewed: > https://github.com/apache/beam/pull/8380 > > [ ] +1,

Re: All validates runner tests seems to be broken.

2019-05-14 Thread Michael Luckey
Created reverting PR https://github.com/apache/beam/pull/8581 You might approve and merge. On Tue, May 14, 2019 at 11:19 PM Ankur Goenka wrote: > Ahh, I see. Good point. > so shall we revert test-infra? > > *From: *Alan Myrvold > *Date: *Tue, May 14, 2019 at 2:16 PM > *To: * > > Other ones

Re: [VOTE] Remove deprecated Java Reference Runner code from repository.

2019-05-14 Thread Andrew Pilloud
+1 for deleting code. *From: *Ahmet Altay *Date: *Tue, May 14, 2019 at 2:22 PM *To: *dev +1 > > *From: *Lukasz Cwik > *Date: *Tue, May 14, 2019 at 2:20 PM > *To: *dev > > +1 >> >> *From: *Daniel Oliveira >> *Date: *Tue, May 14, 2019 at 2:19 PM >> *To: *dev >> >> Hello everyone, >>> >>> I'm

Re: [VOTE] Remove deprecated Java Reference Runner code from repository.

2019-05-14 Thread Pablo Estrada
+1 *From: *Lukasz Cwik *Date: *Tue, May 14, 2019 at 2:20 PM *To: *dev +1 > > *From: *Daniel Oliveira > *Date: *Tue, May 14, 2019 at 2:19 PM > *To: *dev > > Hello everyone, >> >> I'm calling for a vote on removing the deprecated Java Reference Runner >> code. The PR for the change has already

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-14 Thread Brian Hulette
To close the loop on this: Rui just added a check that rejects distinct aggregations for now[1]. I wrote up BEAM-7306[2] to track this feature going forward. [1] https://github.com/apache/beam/pull/8498 [2] https://issues.apache.org/jira/browse/BEAM-7306 *From: *Mingmin Xu *Date: *Mon, May 6,

Re: [VOTE] Remove deprecated Java Reference Runner code from repository.

2019-05-14 Thread Ahmet Altay
+1 *From: *Lukasz Cwik *Date: *Tue, May 14, 2019 at 2:20 PM *To: *dev +1 > > *From: *Daniel Oliveira > *Date: *Tue, May 14, 2019 at 2:19 PM > *To: *dev > > Hello everyone, >> >> I'm calling for a vote on removing the deprecated Java Reference Runner >> code. The PR for the change has already

Re: [VOTE] Remove deprecated Java Reference Runner code from repository.

2019-05-14 Thread Lukasz Cwik
+1 *From: *Daniel Oliveira *Date: *Tue, May 14, 2019 at 2:19 PM *To: *dev Hello everyone, > > I'm calling for a vote on removing the deprecated Java Reference Runner > code. The PR for the change has already been tested and reviewed: > https://github.com/apache/beam/pull/8380 > > [ ] +1,

Re: All validates runner tests seems to be broken.

2019-05-14 Thread Ankur Goenka
It seems to be related. I will try to rerun the seed job. *From: *Lukasz Cwik *Date: *Tue, May 14, 2019 at 1:52 PM *To: *dev Its likely related to the rename done in > https://github.com/apache/beam/commit/f198de033824949eb66ea533ae8a40b8dd8cd7fe#diff-2bb618406f7ee4470a48343283f368a2 > Kenn is

Re: All validates runner tests seems to be broken.

2019-05-14 Thread Michael Luckey
Unfortunately, I missed the fact that seed job triggers automatically. Yes, you need to run the seed job on your branch to replace the old commands. We might consider resetting to legacy commands, i.e. revert ./test-infra folder. What do you think? On Tue, May 14, 2019 at 11:02 PM Andrew

Re: All validates runner tests seems to be broken.

2019-05-14 Thread Alan Myrvold
Other ones are failing on the branch due to the 2.13.0 branch not having https://github.com/apache/beam/pull/8194, and the seed job running from master. *From: *Michael Luckey *Date: *Tue, May 14, 2019 at 2:13 PM *To: * Unfortunately, I missed the fact that seed job triggers automatically. > >

[VOTE] Remove deprecated Java Reference Runner code from repository.

2019-05-14 Thread Daniel Oliveira
Hello everyone, I'm calling for a vote on removing the deprecated Java Reference Runner code. The PR for the change has already been tested and reviewed: https://github.com/apache/beam/pull/8380 [ ] +1, Approve merging the removal PR in it's current state [ ] -1, Veto the removal PR (please

Re: All validates runner tests seems to be broken.

2019-05-14 Thread Ankur Goenka
yup, running the seed job. @Alan Myrvold The other tests are also failing which do not have a typo, *From: *Andrew Pilloud *Date: *Tue, May 14, 2019 at 2:02 PM *To: *dev So it sounds like a number of the failures are related to a single jenkins > config for all branches. This means you can't

Re: All validates runner tests seems to be broken.

2019-05-14 Thread Ankur Goenka
Ahh, I see. Good point. so shall we revert test-infra? *From: *Alan Myrvold *Date: *Tue, May 14, 2019 at 2:16 PM *To: * Other ones are failing on the branch due to the 2.13.0 branch not having > https://github.com/apache/beam/pull/8194, and the seed job running from > master. > > *From:

Re: All validates runner tests seems to be broken.

2019-05-14 Thread Alan Myrvold
That is a typo added in https://github.com/apache/beam/pull/8194 https://github.com/apache/beam/commit/1e7ea0da5073566c3fa26dbc1105105fbe6043ae#diff-9591f0d06e82e711681fd77ed287578b *From: *Ankur Goenka *Date: *Tue, May 14, 2019 at 1:43 PM *To: *dev Hi, > > Following tests seems to be broken

Re: All validates runner tests seems to be broken.

2019-05-14 Thread Andrew Pilloud
So it sounds like a number of the failures are related to a single jenkins config for all branches. This means you can't test the release branch if the targets change after it is cut. One possibility: do "Run Seed Job" on the release branch and kick off all the tests right after that finishes.

Re: All validates runner tests seems to be broken.

2019-05-14 Thread Lukasz Cwik
Its likely related to the rename done in https://github.com/apache/beam/commit/f198de033824949eb66ea533ae8a40b8dd8cd7fe#diff-2bb618406f7ee4470a48343283f368a2 Kenn is tracking a different issue related to publishing being broken in BEAM-7302 which he has a fix for in

Re: Contributing Beam Kata (Java & Python)

2019-05-14 Thread Reuven Lax
Merged *From: *Reza Rokni *Date: *Tue, May 14, 2019 at 1:29 PM *To: * *Cc: *Lars Francke +1 :-) > > *From: *Lukasz Cwik > *Date: *Wed, 15 May 2019 at 04:29 > *To: *dev > *Cc: *Lars Francke > > +1 >> >> *From: *Pablo Estrada >> *Date: *Tue, May 14, 2019 at 1:27 PM >> *To: *dev >> *Cc: *Lars

All validates runner tests seems to be broken.

2019-05-14 Thread Ankur Goenka
Hi, Following tests seems to be broken because of "Project 'unners' not found in root project 'beam'." The command getting executed on Jenkins is gradlew --continue --max-workers=12 -Dorg.gradle.jvmargs=-Xms2g -Dorg.gradle.jvmargs=-Xmx4g :unners:samza:validatesRunner causing the failure. The same

Re: Intro

2019-05-14 Thread Robert Burke
Welcome aboard :D On Tue, 14 May 2019 at 13:28, Ahmet Altay wrote: > Welcome! Added you as a contributor to JIRA. > > *From: *Damien Desfontaines > *Date: *Tue, May 14, 2019 at 1:24 PM > *To: * > > Hi folks, >> >> I'm Damien from the Anonymization team at Google. I might contribute a >>

Re: Contributing Beam Kata (Java & Python)

2019-05-14 Thread Lukasz Cwik
+1 *From: *Pablo Estrada *Date: *Tue, May 14, 2019 at 1:27 PM *To: *dev *Cc: *Lars Francke +1 on merging. > > *From: *Reuven Lax > *Date: *Tue, May 14, 2019 at 1:23 PM > *To: *dev > *Cc: *Lars Francke > > I've been playing around with this that past day .or two, and it's great! >> I'm inclined

Intro

2019-05-14 Thread Damien Desfontaines
Hi folks, I'm Damien from the Anonymization team at Google. I might contribute a couple of PRs on the Go SDK. Can someone give me permission to assign Jira tickets to myself? My username is desfontaines. Thanks in advance! Damien -- I'm working part-time, so I might not see your emails

Re: Contributing Beam Kata (Java & Python)

2019-05-14 Thread Reuven Lax
I've been playing around with this that past day .or two, and it's great! I'm inclined to merge this PR (if nobody objects) so that others in the community can contribute more training katas. Reuven *From: *Ismaël Mejía *Date: *Tue, Apr 23, 2019 at 6:43 AM *To: *Lars Francke *Cc: * Thanks for

Re: Contributing Beam Kata (Java & Python)

2019-05-14 Thread Reza Rokni
+1 :-) *From: *Lukasz Cwik *Date: *Wed, 15 May 2019 at 04:29 *To: *dev *Cc: *Lars Francke +1 > > *From: *Pablo Estrada > *Date: *Tue, May 14, 2019 at 1:27 PM > *To: *dev > *Cc: *Lars Francke > > +1 on merging. >> >> *From: *Reuven Lax >> *Date: *Tue, May 14, 2019 at 1:23 PM >> *To: *dev >>

Re: Intro

2019-05-14 Thread Ahmet Altay
Welcome! Added you as a contributor to JIRA. *From: *Damien Desfontaines *Date: *Tue, May 14, 2019 at 1:24 PM *To: * Hi folks, > > I'm Damien from the Anonymization team at Google. I might contribute a > couple of PRs on the Go SDK. Can someone give me permission to assign Jira > tickets to

Re: Problem with gzip

2019-05-14 Thread Lukasz Cwik
Sorry I couldn't be more helpful. *From: *Allie Chen *Date: *Tue, May 14, 2019 at 10:09 AM *To: * *Cc: *user Thank Lukasz. Unfortunately, decompressing the files is not an option for > us. > > > I am trying to speed up Reshuffle step, since it waits for all data. Here > are two ways I have

Re: Developing a new beam runner for Twister2

2019-05-14 Thread Kenneth Knowles
I added you to the Jira "Contributors" role, so you should be able to self-assign the ticket now. *From: *Pulasthi Supun Wickramasinghe *Date: *Tue, May 14, 2019 at 10:55 AM *To: *Maximilian Michels, *Cc: *dev Hi, > > Thanks Kenn and Max for the information. Will read up a little more and >

Re: Schema is not final... is it allowed to override

2019-05-14 Thread Reuven Lax
Can you explain what you're trying to do? I don't think that embedding the proto descriptor in the schema is a great way to go, but I may not be understanding the use case. *From: *Alex Van Boxel *Date: *Tue, May 14, 2019 at 9:00 AM *To: *ML Beam/Dev Hi Schema lovers, > > I'm implementing

Re: Schema is not final... is it allowed to override

2019-05-14 Thread Kenneth Knowles
Ultimately, a schema will be translated to the protobuf that is currently under design discussion, so Java-specific things like using inheritance to store extra data on a class are not good patterns here. However, coders are designed to be extensible, including having customized portable

Re: Jenkins commenting on PRs again

2019-05-14 Thread Yifan Zou
I've asked Infra and they are not sure as well. *From: *Lukasz Cwik *Date: *Tue, May 14, 2019 at 10:09 AM *To: *dev I have seen this in the past, I don't remember how it was resolved. > > Kenn is specifically asking about seeing messages: > asfgit commented 5 minutes

Re: Developing a new beam runner for Twister2

2019-05-14 Thread Pulasthi Supun Wickramasinghe
Hi, Thanks Kenn and Max for the information. Will read up a little more and discuss with the Twister2 team before deciding on which route to take. I also created an issue in BEAM JIRA[1], but I cannot assign this to my self would someone be able to assign the issue to me. Thanks in advance. [1]

Re: Beam 2.14.0 SNAPSHOTS are broken

2019-05-14 Thread Kenneth Knowles
I expect the problem is here: https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1093 That line was not modified in https://github.com/apache/beam/pull/8194/ even though it probably needs to be. Kenn *From: *Michael Luckey *Date:

Re: SqlTransform Metadata

2019-05-14 Thread Kenneth Knowles
We have support for nested rows so this should be easy. The .withMetadata would reify the struct, moving from Row to WindowedValue if I understand it... SqlTransform.query("SELECT field1 from PCOLLECTION"): Schema = { field1: type1, field2: type2 }

Re: Problem with gzip

2019-05-14 Thread Allie Chen
Thank Lukasz. Unfortunately, decompressing the files is not an option for us. I am trying to speed up Reshuffle step, since it waits for all data. Here are two ways I have tried: 1. add timestamps to the PCollection's elements after reading (since it is bounded source), then apply windowing

Re: Jenkins commenting on PRs again

2019-05-14 Thread Lukasz Cwik
I have seen this in the past, I don't remember how it was resolved. Kenn is specifically asking about seeing messages: asfgit commented 5 minutes ago SUCCESS --none-- *From: *Kenneth Knowles *Date:

Jenkins commenting on PRs again

2019-05-14 Thread Kenneth Knowles
Does anyone know of a change underway that could cause this, or should we escalate to infra? https://github.com/apache/beam/pull/8576#issuecomment-492321226 Kenn

Re: SqlTransform Metadata

2019-05-14 Thread Anton Kedin
Reza, can you share more thoughts on how you think this can work end-to-end? Currently the approach is that populating the rows with the data happens before the SqlTransform, and within the query you can only use the things that are already in the rows or in the catalog/schema (or built-in

Re: Contributing Beam Kata (Java & Python)

2019-05-14 Thread Henry Suryawirawan
I have uploaded the Kata to Stepik which allows a seamless setup directly from the IDE. Please refer to the following comment for an updated instruction on how to set it up on your machine. https://github.com/apache/beam/pull/8358#issuecomment-492296450 On 2019/04/19 11:16:22, hs...@google.com

Schema is not final... is it allowed to override

2019-05-14 Thread Alex Van Boxel
Hi Schema lovers, I'm implementing schema support for Protobuf and I was wondering if it's allowed to override Schema. It looks tempting (as it's not final), as I need a container for the Proto Descriptor. For normal pre-compiled classes it's not required, but for DynamicMessage it is. If I

Re: Beam 2.14.0 SNAPSHOTS are broken

2019-05-14 Thread Michael Luckey
We definitely changed something. With BEAM-4046 artefact naming was adjusted. I ll have a look into that. Thanks for letting me know. Best, michel On Tue, May 14, 2019 at 5:29 PM Ismaël Mejía wrote: > Hello, > > Just updated a project I have to verson 2.14.0-SNAPSHOT and found that > the

Beam 2.14.0 SNAPSHOTS are broken

2019-05-14 Thread Ismaël Mejía
Hello, Just updated a project I have to verson 2.14.0-SNAPSHOT and found that the dependencies don't have the correct name, for example the beam-sdks-java-core pom [1] points to beam.model pipeline 2.14.0-SNAPSHOT compile But such dependency groupId / artifactId does not exist (or has

Re: Problem with gzip

2019-05-14 Thread Lukasz Cwik
Do you need to perform any joins across the files (e.g. Combine.perKey/GroupByKey/...)? If not, you could structure your pipeline ReadFromFileA --> Reshuffle(optional) --> CopyOfPipelineA ReadFromFileB --> Reshuffle(optional) --> CopyOfPipelineB ReadFromFileC --> Reshuffle(optional) -->

Re: Problem with gzip

2019-05-14 Thread Allie Chen
Is it possible to use windowing or somehow pretend it is streaming so Reshuffle or GroupByKey won't wait until all data has been read? Thanks! Allie *From: *Lukasz Cwik *Date: *Fri, May 10, 2019 at 5:36 PM *To: *dev *Cc: *user There is no such flag to turn of fusion. > > Writing 100s of GiBs

Re: SqlTransform Metadata

2019-05-14 Thread Andrew Pilloud
Hi Reza, Where will this metadata be coming from? Beam SQL is tightly coupled with the schema of the PCollection, so adding fields not in the data would be difficult. If what you want is the timestamp out of the DoFn.ProcessContext we might be able to add a SQL function to fetch that. Andrew

Re: Developing a new beam runner for Twister2

2019-05-14 Thread Maximilian Michels
Hi Pulasthi, Great to hear you're planning to implement a Twister2 Runner. If you have limited time, you probably want to decide whether to build a "legacy" Java Runner or a portable one. They are not fundamentally different but there are some tricky implementation details for the portable

SqlTransform Metadata

2019-05-14 Thread Reza Rokni
Hi, What are folks thoughts about adding something like SqlTransform.withMetadata().query(...)to enable users to be able to access things like Timestamp information from within the query without having to refiy the information into the element itself? Cheers Reza -- This email may be