Re: Discussion: Scheduling across runner and SDKHarness in Portability framework

2018-08-20 Thread Ankur Goenka
That's right. To add to it. We added multi threading to python streaming as a single thread is sub optimal for streaming use case. Shall we move towards a conclusion on the SDK bundle processing upper bound? On Mon, Aug 20, 2018 at 1:54 PM Lukasz Cwik wrote: > Ankur, I can see where you are goin

Bug or confusing python code? Are these the same element count metrics?

2018-08-20 Thread Alex Amato
I discovered something while trying to update test_progress_metrics in fn_api_runner_tests.py to inspect the returned MonitoringInfos in addition to

Re: Bootstrapping Beam's Job Server

2018-08-20 Thread Thomas Weise
The original objective was to make test/development easier (which I think is super important for user experience with portable runner). >From first hand experience I can confirm that dealing with Flink clusters and Docker containers for local setup is a significant hurdle for Python developers. T

Re: [PROPOSAL] Prepare Beam 2.7.0 release

2018-08-20 Thread Boyuan Zhang
+1 Thanks for volunteering, Charles! On Mon, Aug 20, 2018 at 3:22 PM Rafael Fernandez wrote: > +1, thanks for volunteering, Charles! > > On Mon, Aug 20, 2018 at 12:09 PM Charles Chen wrote: > >> Thank you Andrew for pointing out my mistake. We should follow the >> calendar and aim to cut on 8/

Re: Bootstrapping Beam's Job Server

2018-08-20 Thread Henning Rohde
>> Option 3) would be to map in the docker binary and socket to allow >> the containerized Flink job server to start "sibling" containers on >> the host. > >Do you mean packaging Docker inside the Job Server container and >mounting /var/run/docker.sock from the host inside the container? That >look

Re: [PROPOSAL] Prepare Beam 2.7.0 release

2018-08-20 Thread Rafael Fernandez
+1, thanks for volunteering, Charles! On Mon, Aug 20, 2018 at 12:09 PM Charles Chen wrote: > Thank you Andrew for pointing out my mistake. We should follow the > calendar and aim to cut on 8/29, not 9/7 as I incorrectly wrote earlier. > > On Mon, Aug 20, 2018 at 12:02 PM Andrew Pilloud > wrote

Re: Bootstrapping Beam's Job Server

2018-08-20 Thread Maximilian Michels
Thanks for your suggestions. Please see below. > Option 3) would be to map in the docker binary and socket to allow > the containerized Flink job server to start "sibling" containers on > the host. Do you mean packaging Docker inside the Job Server container and mounting /var/run/docker.sock from

Re: Discussion: Scheduling across runner and SDKHarness in Portability framework

2018-08-20 Thread Lukasz Cwik
Ankur, I can see where you are going with your argument. I believe there is certain information which is static and won't change at pipeline creation time (such as Python SDK is most efficient doing one bundle at a time) and some stuff which is best at runtime, like memory and CPU limits, worker co

Re: Discussion: Scheduling across runner and SDKHarness in Portability framework

2018-08-20 Thread Ankur Goenka
I would prefer to to keep it dynamic as it can be changed by the infrastructure or the pipeline author. Like in case of Python, number of concurrent bundle can be changed by setting pipeline option worker_count. And for Java it can be computed based on the cpus on the machine. For Flink runner, we

Re: Travis apache credentials

2018-08-20 Thread Lukasz Cwik
If you can't get an answer quickly, its best to read the Apache policy on release signing: http://www.apache.org/dev/release-signing.html On Mon, Aug 20, 2018 at 10:16 AM Pablo Estrada wrote: > This would mean that released artifacts are signed with two different keys > (wheels with travis / jar

Re: Should we mention TF Transform in Beam site?

2018-08-20 Thread Ankur Goenka
+1 Adding to Thomas's suggestion, we can also add the future plans to keep people excited about what's in store. On Mon, Aug 20, 2018 at 11:09 AM Thomas Weise wrote: > +1 > > It would also be helpful to mention important current restrictions wrt > availability on runners, etc. > > On Mon, Aug 2

Re: Git export-ignore for gradle

2018-08-20 Thread Andrew Pilloud
We can't ship gradle-wrapper.jar in the artifacts, so the gradlew command wouldn't work even if it was included. See https://issues.apache.org/jira/browse/LEGAL-288 and the original discussion here: https://lists.apache.org/thread.html/d8bd7a0395d979246b3aff02fbb562ac0467828c4adfc25029839fab@%3Cdev

Re: Bootstrapping Beam's Job Server

2018-08-20 Thread Ankur Goenka
Option 4) We are also thinking about adding process based SDKHarness. This will avoid docker in docker scenario. Process based SDKHarness also has other applications and might be desirable in some of the production use cases. On Mon, Aug 20, 2018 at 11:49 AM Henning Rohde wrote: > Option 3) woul

Re: [PROPOSAL] Prepare Beam 2.7.0 release

2018-08-20 Thread Charles Chen
Thank you Andrew for pointing out my mistake. We should follow the calendar and aim to cut on 8/29, not 9/7 as I incorrectly wrote earlier. On Mon, Aug 20, 2018 at 12:02 PM Andrew Pilloud wrote: > +1 Thanks for volunteering! The calendar I have puts the cut date at > August 29th, which looks to

Re: [PROPOSAL] Prepare Beam 2.7.0 release

2018-08-20 Thread Andrew Pilloud
+1 Thanks for volunteering! The calendar I have puts the cut date at August 29th, which looks to be 6 weeks from when 2.6.0 was cut. Do I have the wrong calendar? See: https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com&ctz=America%2FLos_Angeles A

Re: Bootstrapping Beam's Job Server

2018-08-20 Thread Henning Rohde
Option 3) would be to map in the docker binary and socket to allow the containerized Flink job server to start "sibling" containers on the host. That both avoids docker-in-docker (which is indeed undesirable) as well as extra requirements for each SDK to spin up containers -- notably, if the runner

Re: [PROPOSAL] Prepare Beam 2.7.0 release

2018-08-20 Thread Connell O'Callaghan
+1 Charles thank you for taking this up and helping us maintain this schedule. On Mon, Aug 20, 2018 at 11:29 AM Charles Chen wrote: > Hey everyone, > > Our release calendar indicates that the process for the 2.7.0 Beam release > should start on September 7. > > I volunteer to perform this releas

[PROPOSAL] Prepare Beam 2.7.0 release

2018-08-20 Thread Charles Chen
Hey everyone, Our release calendar indicates that the process for the 2.7.0 Beam release should start on September 7. I volunteer to perform this release and propose the following schedule: - We start triaging issues in JIRA this week. - I will cut the initial 2.7.0 release branch on Septe

Re: Should we mention TF Transform in Beam site?

2018-08-20 Thread Thomas Weise
+1 It would also be helpful to mention important current restrictions wrt availability on runners, etc. On Mon, Aug 20, 2018 at 10:45 AM Pascal Gula wrote: > fully agree! > > On Mon, Aug 20, 2018 at 7:23 PM, Rui Wang wrote: > >> +1 to add it on Beam website. >> >> >> >> -Rui >> >> On Mon, Aug

Re: Should we mention TF Transform in Beam site?

2018-08-20 Thread Pascal Gula
fully agree! On Mon, Aug 20, 2018 at 7:23 PM, Rui Wang wrote: > +1 to add it on Beam website. > > > > -Rui > > On Mon, Aug 20, 2018 at 10:15 AM Pablo Estrada wrote: > >> Other projects mention their ML / Graph / misc tooling libraries in their >> websites. >> It may be good for Beam to direct p

Re: Should we mention TF Transform in Beam site?

2018-08-20 Thread Rui Wang
+1 to add it on Beam website. -Rui On Mon, Aug 20, 2018 at 10:15 AM Pablo Estrada wrote: > Other projects mention their ML / Graph / misc tooling libraries in their > websites. > It may be good for Beam to direct people to use Tensorflow Transform[1] if > they want to use beam for ML? > What

Re: Travis apache credentials

2018-08-20 Thread Pablo Estrada
This would mean that released artifacts are signed with two different keys (wheels with travis / jars and others with release manager's). Is this consistent with Apache policy? Just checking : ) -P. On Mon, Aug 20, 2018 at 9:47 AM Boyuan Zhang wrote: > Hey Robert, > > I think your idea would be

Should we mention TF Transform in Beam site?

2018-08-20 Thread Pablo Estrada
Other projects mention their ML / Graph / misc tooling libraries in their websites. It may be good for Beam to direct people to use Tensorflow Transform[1] if they want to use beam for ML? What do people think? Best -P. [1] https://www.tensorflow.org/tfx/transform/

Re: Travis apache credentials

2018-08-20 Thread Boyuan Zhang
Hey Robert, I think your idea would be possible if following things are possible: 1. Link beam-wheels into travis-cli. I'm not sure who has the right permission to perform this operation. 2. We have a common svn credential or one of the beam committers would like to put his(or her) credential into

Re: Beam application upgrade on Flink crashes

2018-08-20 Thread Maximilian Michels
AFAIK the serializer used here is the CoderTypeSerializer which may not be recoverable because of changes to the contained Coder (TaggedKvCoder). It doesn't currently have a serialVersionUID, so even small changes could break serialization backwards-compatibility. As of now Beam doesn't offer the

Re: Discussion: Scheduling across runner and SDKHarness in Portability framework

2018-08-20 Thread Lukasz Cwik
+1 on making the resources part of a proto. Based upon what Henning linked to, the provisioning API seems like an appropriate place to provide this information. Thomas, I believe the environment proto is the best place to add information that a runner may want to know about upfront during pipeline

Re: dulicate key-value elements lost when transfering them as side-inputs

2018-08-20 Thread Lukasz Cwik
Yes, that is a bug. I filed and assigned https://issues.apache.org/jira/browse/BEAM-5184 to you, feel free to unassign if your unable to make progress. On Mon, Aug 20, 2018 at 1:14 AM Plajt, Vaclav wrote: > Hi Beam devs, > > I'm working on Euphoria DSL, where we implemented `BroadcastHashJoin` >

Re: Status of IntelliJ with Gradle

2018-08-20 Thread Lukasz Cwik
Yes, I have the same issues with vendoring. These are the things that I have tried without success to get Intellij to import the vendored modules correctly: * attempted to modify the idea.module.scopes to only include the vendored artifacts (for some reason this is ignored and Intellij is relying o

Re: Beam application upgrade on Flink crashes

2018-08-20 Thread Stephan Ewen
Hi Jozef! When restoring state, the serializer that created the state must still be available, so the state can be read. It looks like some serializer classes were removed between Beam versions (or changed in an incompatible manner). Backwards compatibility of an operator implementation needs coo

Re: Status of IntelliJ with Gradle

2018-08-20 Thread Maximilian Michels
Thank you Etienne for opening the issue. Anyone else having problems with the shaded Protobuf dependency? On 20.08.18 16:14, Etienne Chauchot wrote: > Hi Max, > > I experienced the same, I had first opened a general ticket > (https://issues.apache.org/jira/browse/BEAM-4418) about gradle > improv

Bootstrapping Beam's Job Server

2018-08-20 Thread Maximilian Michels
Hi everyone, I wanted to get your opinion on the Job-Server startup [1] which is part of the portability story. I've created a docker container to bring up Beam's Job Server, which is the entry point for pipeline execution. Generally, this works fine when the backend (Flink in this case) runs ext

Re: Status of IntelliJ with Gradle

2018-08-20 Thread Etienne Chauchot
Hi Max, I experienced the same, I had first opened a general ticket (https://issues.apache.org/jira/browse/BEAM-4418) about gradle improvements and I just split it in several tickets. Here is the one concerning the same issue: https://issue id="-x-evo-selection-start-marker">s.apache.org/jira/bro

Beam application upgrade on Flink crashes

2018-08-20 Thread Jozef Vilcek
Hello, I am attempting to upgrade Beam app from 2.5.0 running on Flink 1.4.0 to Beam 2.6.0 running on Flink 1.5.0. I am not aware of any state migration changes needed for Flink 1.4.0 -> 1.5.0 so I am just starting a new App with updated libs from Flink save-point captured by previous version of

Re: Status of IntelliJ with Gradle

2018-08-20 Thread Maximilian Michels
Sorry, please disregard this duplicate mail. The Apache mail relay was flaky and my client doesn't seem to handle it particularly well. On 20.08.18 15:51, Maximilian Michels wrote: > Hi Beamers, > > It's great to see the Beam build system overhauled. Thank you for all > the hard work. > > That s

Status of IntelliJ with Gradle

2018-08-20 Thread Maximilian Michels
Hi Beamers, It's great to see the Beam build system overhauled. Thank you for all the hard work. That said, I've just started contributing to Beam again and I feel really stupid for not having a fully-functional IDE. I've closely followed the IntelliJ/Gradle instructions [1]. In the terminal ever

Status of IntelliJ with Gradle

2018-08-20 Thread Maximilian Michels
Hi Beamers, It's great to see the Beam build system overhauled. Thank you for all the hard work. That said, I've just started contributing to Beam again and I feel really stupid for not having a fully-functional IDE. I've closely followed the IntelliJ/Gradle instructions [1]. In the terminal ever

Beam Dependency Check Report (2018-08-20)

2018-08-20 Thread Apache Jenkins Server
High Priority Dependency Updates Of Beam Python SDK: Dependency Name Current Version Latest Version Release Date Of the Current Used Version Release Date Of The Latest Release JIRA Issue google-cloud-bigquery 0.25.0 1.5.0

Re: Removing documentation for old Beam versions

2018-08-20 Thread Robert Bradshaw
On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise wrote: > > Yes, I think the separation of generated code will need to occur prior to > completing the merge and switching the web site to the main repo. > > There should be no reason to check generated documentation into either of the > repos/branches.

Travis apache credentials

2018-08-20 Thread Robert Bradshaw
Boyaun set up a nice repository for building Python wheels at https://github.com/apache/beam-wheels . Does anyone know if it would be possible to get SVN credentials for travis so every user wouldn't have to fork the repository and put their own in?

Git export-ignore for gradle

2018-08-20 Thread Jozef Vilcek
Hello, this commit added export-ignore for some of the gradle stuff https://github.com/apache/beam/commit/2a0f68b0c743d37c46486b81500043b4b420c825 This means that downloaded zip archive of git repository is not build-able via 'gradlew` command. I am curious about the rationale behind this feature

dulicate key-value elements lost when transfering them as side-inputs

2018-08-20 Thread Plajt, Vaclav
Hi Beam devs, I'm working on Euphoria DSL, where we implemented `BroadcastHashJoin` using side-inputs. But our test shows some missing data. We use `View.asMultimap()` to get our join-small-side to view in form of `PCollectionView>>`. Then some duplicated key-value (the same key and value as som