Announcement & Proposal: HDFS tests on large cluster.

2018-06-06 Thread Łukasz Gajowy
Hi all, I'd like to announce that thanks to Kamil Szewczyk, since this PR we have 4 file-based HDFS tests run on a "Large HDFS Cluster"! More specifically I mean: - beam_PerformanceTests_Compressed_TextIOIT_HDFS -

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-06-06 Thread Jean-Baptiste Onofré
Yup, I disabled the daemon for the release plugin execution. Regards JB On 06/06/2018 08:39, Romain Manni-Bucau wrote: > Also maybe deactivate the daemon (--no-daemon) since its cache can get > corrupted ~easily. > > Romain Manni-Bucau > @rmannibucau |  Blog >

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-06-06 Thread Jean-Baptiste Onofré
It looks better with --no-parallel Regards JB On 06/06/2018 07:49, Jean-Baptiste Onofré wrote: > New issue during: > > ./gradlew publish -PisRelease > >> Task :beam-runners-apex:compileTestJava >

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-06-06 Thread Romain Manni-Bucau
Also maybe deactivate the daemon (--no-daemon) since its cache can get corrupted ~easily. Romain Manni-Bucau @rmannibucau | Blog | Old Blog | Github |

[VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-06 Thread Jean-Baptiste Onofré
Hi everyone, Please review and vote on the release candidate #1 for the version 2.5.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) NB: this is the first release using Gradle, so don't be too harsh ;) A PR about the release guide

Re: [VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-06 Thread Etienne Chauchot
Thanks JB for all your work ! I believe doing the first gradle release must have been hard. I'll run Nexmark on the release and keep you posted. Best Etienne Le mercredi 06 juin 2018 à 10:44 +0200, Jean-Baptiste Onofré a écrit : > Hi everyone, > > Please review and vote on the release

Re: Announcement & Proposal: HDFS tests on large cluster.

2018-06-06 Thread Kenneth Knowles
This is rad. Another +1 from me for a bigger cluster. What do you need to make that happen? Kenn On Wed, Jun 6, 2018 at 10:16 AM Pablo Estrada wrote: > This is really cool! > > +1 for having a cluster with more than one machine run the test. > > -P. > > On Wed, Jun 6, 2018 at 9:57 AM Chamikara

Re: Beam SQL Pipeline Options

2018-06-06 Thread Kenneth Knowles
This is a nice short design discussion doc, and perhaps a cooler piece of news hidden in the paragraph :-) Kenn On Wed, Jun 6, 2018 at 9:24 AM Andrew Pilloud wrote: > We are just about to the point of having a working pure SQL workflow for > Beam! One of the last things that remains is how to

Re: Beam SQL Pipeline Options

2018-06-06 Thread arun kumar
Hi Thanks for the update. Can you please share me if you have any documentation for connecting to postgres using beam code. Thanks Arun On Wed, Jun 6, 2018, 9:54 PM Andrew Pilloud wrote: > We are just about to the point of having a working pure SQL workflow for > Beam! One of the last things

Re: Beam breaks when it isn't loaded via the Thread Context Class Loader

2018-06-06 Thread Romain Manni-Bucau
Note sure the example is atomic enough but in https://github.com/Talend/component-runtime/blob/master/component-runtime-manager/src/main/java/org/talend/sdk/component/runtime/manager/finder/StandaloneContainerFinder.java#L40 the "instance()" is a singleton used by all the runtime of the framework.

Re: [VOTE] Policies for managing Beam dependencies

2018-06-06 Thread Kenneth Knowles
+0.5 I like the spirit of these policies. I think they need a little wording work. Comments inline. On Wed, Jun 6, 2018 at 4:53 PM, Chamikara Jayalath > wrote: >> >> >> (1) Human readable reports on status of Beam dependencies are generated >> weekly and shared with the Beam community through

Re: [DISCUSS] [BEAM-4126] Deleting Maven build files (pom.xml) grace period?

2018-06-06 Thread Kenneth Knowles
+1 Definitely a good opportunity to decouple your build tools from your dependencies' build tools. On Wed, Jun 6, 2018 at 2:42 PM Ted Yu wrote: > +1 on this effort > > Original message > From: Chamikara Jayalath > Date: 6/6/18 2:09 PM (GMT-08:00) > To: dev@beam.apache.org,

Re: [Proposal] Apache Beam's Public Project Roadmap

2018-06-06 Thread Kenneth Knowles
This is great. I'm really excited to build a community process for this. Kenn On Wed, Jun 6, 2018 at 5:05 PM Griselda Cuevas wrote: > Hi Beam Community, > > I'd like to propose the creation of a Public Project Roadmap. Here are the > details as well as some artifacts I started already. > >

Re: [Call for items] Beam June Newsletter

2018-06-06 Thread Scott Wegner
Thanks for putting this together, Gris. I'm not familiar with the format: what time period this newsletter should cover? Is this a monthly newsletter and the June edition covers news from May? On Tue, Jun 5, 2018 at 4:47 PM Griselda Cuevas wrote: > Hi Everyone, > > Just a reminder to add items

Re: [VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-06 Thread Robert Bradshaw
Thank you JB! Glad to see this finally rolling out. I don't see the Python artifacts, did you mean to stage them in https://dist.apache.org/repos/dist/dev/beam/2.5.0/? If you want help building wheels, let me know. On Wed, Jun 6, 2018 at 1:50 AM Etienne Chauchot wrote: > Thanks JB for all

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-06-06 Thread Robert Bradshaw
Are there JIRAs filed for these? I have yet to have a corrupt cache, but it would be nice to know how to avoid and fix it. Did --no-parallel make the ErrorProne error go away? On Tue, Jun 5, 2018 at 11:39 PM Romain Manni-Bucau wrote: > Also maybe deactivate the daemon (--no-daemon) since its

Re: The full list of proposals / prototype documents

2018-06-06 Thread Alexey Romanenko
FYI: Finally, it was merged and you can find this page here: https://beam.apache.org/contribute/design-documents/ Thank you everybody who helped me to compile this list! I’ll try to do my best to update this with new coming docs. In the

Re: [VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-06 Thread Jean-Baptiste Onofré
Hi Robert, sorry, I missed this step, let me add on dist.apache.org. Thanks for the catch and sorry about that ! Regards JB On 06/06/2018 18:06, Robert Bradshaw wrote: > Thank you JB! Glad to see this finally rolling out. I don't see the > Python artifacts, did you mean to stage them > in 

Beam SQL Pipeline Options

2018-06-06 Thread Andrew Pilloud
We are just about to the point of having a working pure SQL workflow for Beam! One of the last things that remains is how to configure Pipeline Options via a SQL shell. I have written up a proposal to use the set statement, for example "SET runner=DataflowRunner". I'm looking for feedback,

Re: [VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-06 Thread Jean-Baptiste Onofré
I updated dist.apache.org dev with Python distribution. Regards JB On 06/06/2018 18:19, Jean-Baptiste Onofré wrote: > Hi Robert, > > sorry, I missed this step, let me add on dist.apache.org. > > Thanks for the catch and sorry about that ! > > Regards > JB > > On 06/06/2018 18:06, Robert

Re: Announcement & Proposal: HDFS tests on large cluster.

2018-06-06 Thread Pablo Estrada
This is really cool! +1 for having a cluster with more than one machine run the test. -P. On Wed, Jun 6, 2018 at 9:57 AM Chamikara Jayalath wrote: > On Wed, Jun 6, 2018 at 5:19 AM Łukasz Gajowy > wrote: > >> Hi all, >> >> I'd like to announce that thanks to Kamil Szewczyk, since this PR >>

Apache Beam Contribution Guide Improvements

2018-06-06 Thread Alan Myrvold
I've written up a brief document with ideas for Apache Beam contribution guide improvements. I'm most interest in clarifying the goals of this guide, including whether documenting on Windows is worth describing, and what areas are missing. Feedback welcome, especially comments / suggestions in

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-06-06 Thread Pablo Estrada
This is because the release plugin that we went with[1] only produces release (and release candidate) tags, but it does not create a new branch, so the branch itself had to be created manually. There were other plugins with the extra branching functionality[2], but we decided to use the more

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-06-06 Thread Jean-Baptiste Onofré
The tag is created by the plugin, but it's not pushed on remote. I had to do: git push apache v2.5.0.RC1 And yes, I created the branch "manually". I also did a mvn versions:set on master to update the pom.xml, but not on the branch (as I focused on gradle release). Regards JB On 06/06/2018

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-06-06 Thread Scott Wegner
Tim and Boyuan were previously discussing similar issues in the Slack channel [1], and the root cause was related to JAR corruption by the signing plugin when using parallel builds. There was also some investigation in BEAM-4328 [2]. I believe fixes for all known-issues are now merged. The

Re: Announcement & Proposal: HDFS tests on large cluster.

2018-06-06 Thread Chamikara Jayalath
On Wed, Jun 6, 2018 at 5:19 AM Łukasz Gajowy wrote: > Hi all, > > I'd like to announce that thanks to Kamil Szewczyk, since this PR > we have 4 file-based HDFS > tests run on a "Large HDFS Cluster"! More specifically I mean: > > -

SDK Harness Deployment

2018-06-06 Thread Thomas Weise
Hi, The current plan for running the SDK harness is to execute docker to launch SDK containers with service endpoints provided by the runner in the docker command line. In the case of Flink runner (prototype), the service endpoints are dynamically allocated per executable stage. There is

Re: Read from a Google Sheet based BigQuery table - Python SDK

2018-06-06 Thread Chamikara Jayalath
On Tue, Jun 5, 2018 at 9:56 PM Leonardo Biagioli wrote: > Hi Cham, > > thanks but those pages are related to the authentication inside Google > Cloud Platform services, I need to authenticate the job on Sheets… Since > that the required scope is https://www.googleapis.com/auth/drive is there > a

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-06-06 Thread Lukasz Cwik
I was under the impression that a "release" like plugin was added and it was meant to generate the git tag and do other release related tasks: https://github.com/apache/beam/blob/72cbd99d6b62bc7ed16dbd1288cd61d54e8bda37/build.gradle#L181 Pablo / Ahmet, do you have more information as it doesn't

Re: [DISCUSS] [BEAM-4126] Deleting Maven build files (pom.xml) grace period?

2018-06-06 Thread Chamikara Jayalath
+1 for the overall effort. As Pablo mentioned, we need some time to migrate internal Dataflow build off of Maven build files. I created https://issues.apache.org/jira/browse/BEAM-4512 for this. Thanks, Cham On Wed, Jun 6, 2018 at 1:30 PM Eugene Kirpichov wrote: > Is it possible for Dataflow to

Re: SDK Harness Deployment

2018-06-06 Thread Thomas Weise
Hi Henning, Here is a page that explains the scheduling and overall functioning of the task manager in Flink: https://ci.apache.org/projects/flink/flink-docs-release-1.5/internals/job_scheduling.html Here are the 2 issues: #1 each task manager process get assigned multiple units of execution

Re: [DISCUSS] [BEAM-4126] Deleting Maven build files (pom.xml) grace period?

2018-06-06 Thread Ted Yu
+1 on this effort Original message From: Chamikara Jayalath Date: 6/6/18 2:09 PM (GMT-08:00) To: dev@beam.apache.org, u...@beam.apache.org Subject: Re: [DISCUSS] [BEAM-4126] Deleting Maven build files (pom.xml) grace period? +1 for the overall effort. As Pablo mentioned, we

Re: Proposal: keeping post-commit tests green

2018-06-06 Thread Mikhail Gryzykhin
Hello everyone, Most of the comments on my last draft addressed technical details of automation implementation of specific processes proposed. No major process changes were suggested. If you have not yet, please review this document. Highlights from last change: * Bumped splitting tests jobs

Re: [DISCUSS] [BEAM-4126] Deleting Maven build files (pom.xml) grace period?

2018-06-06 Thread Pablo Estrada
I agree that we should delete the pom.xml files soon, as they create a burden for maintainers. I'd like to be able to extend the grace period by a bit, to allow the internal build systems at Google to move away from using the Beam poms. We use these pom files to build Dataflow workers, and thus

Re: [VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-06 Thread Reuven Lax
Agreed ! It's not ready being the first to try something. Thank you so much for helping blaze the way! Reuven On Wed, Jun 6, 2018, 11:50 AM Etienne Chauchot wrote: > Thanks JB for all your work ! I believe doing the first gradle release > must have been hard. > I'll run Nexmark on the release

Re: Beam breaks when it isn't loaded via the Thread Context Class Loader

2018-06-06 Thread Lukasz Cwik
Romain, can you point to an example of a global singleton registry that does this right for class loading (it may allow people to work towards such an effort)? On Tue, Jun 5, 2018 at 10:06 PM Romain Manni-Bucau wrote: > It is actually very localised in runner code where beam should reset the >

Re: SDK Harness Deployment

2018-06-06 Thread Henning Rohde
Thanks for writing down and explaining the problem, Thomas. Let me try to tease some of the topics apart. First, the basic setup is currently as follows: there are 2 worker processes (A) "SDK harness" and (B) "Runner harness" that needs to communicate. A connects to B. The fundamental endpoint(s)

[DISCUSS] [BEAM-4126] Deleting Maven build files (pom.xml) grace period?

2018-06-06 Thread Lukasz Cwik
Note: Apache Beam will still provide pom.xml for each release it produces. This is only about people using Maven to build Apache Beam themselves and not relying on the released artifacts in Maven Central. With the first release using Gradle as the build system is underway, I wanted to start this

Re: [Call for items] Beam June Newsletter

2018-06-06 Thread Griselda Cuevas
Great question Scott! Yes, this is a monthly Newsletter and the June Edition should cover the things that happened in May. On Wed, 6 Jun 2018 at 08:05, Scott Wegner wrote: > Thanks for putting this together, Gris. I'm not familiar with the format: > what time period this newsletter should

Re: [DISCUSS] [BEAM-4126] Deleting Maven build files (pom.xml) grace period?

2018-06-06 Thread Eugene Kirpichov
Is it possible for Dataflow to just keep a copy of the pom.xmls and delete it as soon as Dataflow is migrated? Overall +1, I've been using Gradle without issues for a while and almost forgot pom.xml's still existed. On Wed, Jun 6, 2018, 1:13 PM Pablo Estrada wrote: > I agree that we should

Re: Existing transactionality inconsistency in the Beam Java State API

2018-06-06 Thread Lukasz Cwik
Sounds great and thanks for the conclusion summary. On Tue, Jun 5, 2018 at 4:56 PM Charles Chen wrote: > Thanks everyone for commenting and contributing to the discussion. There > appears to be enough consensus on these points to start an initial > implementation. Specifically, I'd like to

Re: [VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-06 Thread Lukasz Cwik
I have added the 2.5.0 tab to the validation spreadsheet[1], please mark down which things you intend to validate for the release and update the community on progress. 1: https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=152451807 On Wed, Jun 6, 2018 at

Re: Managing outdated dependencies

2018-06-06 Thread Chamikara Jayalath
Since there seems to be a general agreement on these I think we can start a vote. Possible post-vote tasks include following. * Generate human readable reports on status of Beam dependencies. * Automatically create JIRAs for significantly outdated dependencies based on above reports. * Copy

Re: Proposal: keeping precommit times fast

2018-06-06 Thread Robert Bradshaw
Even if it's not perfect, seems like it'd surely be a net win (and probably a large one). Also, the build cache should look back at more than just the single previous build, so if any previous jobs (up to the cache size limit) built/tested artifacts unchanged by the current PR, the results would

[VOTE] Policies for managing Beam dependencies

2018-06-06 Thread Chamikara Jayalath
Hi All, We recently had a discussion regarding managing Beam dependencies. Please see [1] for the email thread and [2] for the relevant document. This discussion resulted in following policies. I believe, these will help keep Beam at a healthy state while allowing human intervention when needed.

Re: SDK Harness Deployment

2018-06-06 Thread Henning Rohde
Thanks Thomas. The id provided to the SDK harness must be sent as a gRPC header when it connects to the TM. The TM can use a fixed port and multiplex requests based on that id - to match the SDK harness with the appropriate job/slot/whatnot. The relationship between SDK harness and TM is not

Re: Proposal: keeping precommit times fast

2018-06-06 Thread Udi Meiri
To follow up on the Jenkins Job Cacher Plugin: Using a Jenkins plugin to save and reuse the Gradle cache for successive precommit jobs. The problem with this approach is that the precommit runs that a Jenkins server runs are unrelated. Say you have 2 PRs, A and B, and the precommit job for B

Re: [VOTE] Policies for managing Beam dependencies

2018-06-06 Thread Ahmet Altay
+1 Thank you for driving these decisions. I would make a meta-point, all other recent votes and if passes this one could be converted to web site documents at some point in an easily accessible and linkable way. On Wed, Jun 6, 2018 at 4:53 PM, Chamikara Jayalath wrote: > Hi All, > > We

[Proposal] Apache Beam's Public Project Roadmap

2018-06-06 Thread Griselda Cuevas
Hi Beam Community, I'd like to propose the creation of a Public Project Roadmap. Here are the details as well as some artifacts I started already. ---_--- *Proposal * *What?* Create a simple spreadsheet-based project roadmap

Re: [VOTE] Policies for managing Beam dependencies

2018-06-06 Thread Chamikara Jayalath
Hi Kenn, On Wed, Jun 6, 2018 at 8:14 PM Kenneth Knowles wrote: > +0.5 > > I like the spirit of these policies. I think they need a little wording > work. Comments inline. > > On Wed, Jun 6, 2018 at 4:53 PM, Chamikara Jayalath >> wrote: >>> >>> >>> (1) Human readable reports on status of Beam