Go SDK integration tests

2018-05-08 Thread Henning Rohde
Hi everyone,

 I'm currently tinkering with adding integration tests for Go (BEAM-3827)
and wrote down a small proposal to that end:

https://docs.google.com/document/d/1jy6EE7D4RjgfNV0FhD3rMsT1YKhnUfcHRZMAlC6ygXw/edit?usp=sharing

Similarly to other SDKs, the proposal is to add self-validating integration
tests that don't produce output. But unlike Java, we can't easily reuse the
Go example code directly and use a driver program with all tests linked in
to run against an arbitrary runner.

Comments welcome!

Thanks,
 Henning


Build failed in Jenkins: beam_SeedJob #1652

2018-05-08 Thread Apache Jenkins Server
See 


Changes:

[github] Explictly delcare globals defined elsewhere

[mairbek] Introduced SpannerWriteResult that

[mairbek] Addressed comments

[mairbek] Happy checkstyle

[github] Adding a microbenchmark for side input iterables. (#5294)

[apilloud] Enable githubCommitNotifier for post commits

[tgroh] Migrate the `portable` subpackage to Portability

[Pablo] Make experiments as set attr of RuntimeValueProvider

--
Started by timer
[EnvInject] - Loading node environment variables.
Building remotely on beam3 (beam) in workspace 

Cloning the remote Git repository
Cloning repository https://github.com/apache/beam.git
 > git init  # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # 
 > timeout=10
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/${ghprbPullId}/*:refs/remotes/origin/pr/${ghprbPullId}/*
 > git rev-parse origin/master^{commit} # timeout=10
Checking out Revision 60f90c8dcb229c35a82c7be15e64a89678bae058 (origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 60f90c8dcb229c35a82c7be15e64a89678bae058
Commit message: "Make experiments as set attr of RuntimeValueProvider"
 > git rev-list --no-walk cd92c5e2edd275c377793532ac65b07fb571590d # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Processing DSL script job_00_seed.groovy
Processing DSL script job_beam_Inventory.groovy
Processing DSL script job_beam_PerformanceTests_Dataflow.groovy
Processing DSL script job_beam_PerformanceTests_FileBasedIO_IT.groovy
Processing DSL script job_beam_PerformanceTests_FileBasedIO_IT_HDFS.groovy
Processing DSL script job_beam_PerformanceTests_HadoopInputFormat.groovy
Processing DSL script job_beam_PerformanceTests_JDBC.groovy
Processing DSL script job_beam_PerformanceTests_MongoDBIO_IT.groovy
Processing DSL script job_beam_PerformanceTests_Python.groovy
Processing DSL script job_beam_PerformanceTests_Spark.groovy
Processing DSL script job_beam_PostCommit_Go_GradleBuild.groovy
Processing DSL script job_beam_PostCommit_Java_GradleBuild.groovy
Processing DSL script job_beam_PostCommit_Java_ValidatesRunner_Apex.groovy
Processing DSL script job_beam_PostCommit_Java_ValidatesRunner_Dataflow.groovy
Processing DSL script job_beam_PostCommit_Java_ValidatesRunner_Flink.groovy
Processing DSL script job_beam_PostCommit_Java_ValidatesRunner_Gearpump.groovy
Processing DSL script job_beam_PostCommit_Java_ValidatesRunner_Spark.groovy
Processing DSL script 
job_beam_PostCommit_Python_ValidatesContainer_Dataflow.groovy
Processing DSL script job_beam_PostCommit_Python_ValidatesRunner_Dataflow.groovy
Processing DSL script job_beam_PostCommit_Python_Verify.groovy
Processing DSL script job_beam_PostRelease_NightlySnapshot.groovy
ERROR: Could not read configuration file 
/x1/jenkins/jenkins-home/jobs/beam_PostRelease_NightlySnapshot/config.xml for 
job beam_PostRelease_NightlySnapshot
Not sending mail to unregistered user apill...@google.com
Not sending mail to unregistered user mair...@google.com
Not sending mail to unregistered user git...@alasdairhodge.co.uk


Re: Graal instead of docker?

2018-05-08 Thread Henning Rohde
There are indeed lots of possibilities for interesting docker alternatives
with different tradeoffs and capabilities, but in generally both the runner
as well as the SDK must support them for it to work. As mentioned, docker
(as used in the container contract) is meant as a flexible main option but
not necessarily the only option. I see no problem with certain
pipeline-SDK-runner combinations additionally supporting a specialized
setup. Pipeline can be a factor, because that some transforms might depend
on aspects of the runtime environment -- such as system libraries or
shelling out to a /bin/foo.

The worker boot code is tied to the current container contract, so
pre-launched workers would presumably not use that code path and are not be
bound by its assumptions. In particular, such a setup might want to invert
who initiates the connection from the SDK worker to the runner. Pipeline
options and global state in the SDK and user functions process might make
it difficult to safely reuse worker processes across pipelines, but also
doable in certain scenarios.

Henning

On Tue, May 8, 2018 at 3:51 PM Thomas Weise  wrote:

>
>
> On Sat, May 5, 2018 at 3:58 PM, Robert Bradshaw 
> wrote:
>
>>
>> I would welcome changes to
>>
>> https://github.com/apache/beam/blob/v2.4.0/model/pipeline/src/main/proto/beam_runner_api.proto#L730
>> that would provide alternatives to docker (one of which comes to mind is
>> "I
>> already brought up a worker(s) for you (which could be the same process
>> that handled pipeline construction in testing scenarios), here's how to
>> connect to it/them.") Another option, which would seem to appeal to you in
>> particular, would be "the worker code is linked into the runner's binary,
>> use this process as the worker" (though note even for java-on-java, it can
>> be advantageous to shield the worker and runner code from each others
>> environments, dependencies, and version requirements.) This latter should
>> still likely use the FnApi to talk to itself (either over GRPC on local
>> ports, or possibly better via direct function calls eliminating the RPC
>> overhead altogether--this is how the fast local runner in Python works).
>> There may be runner environments well controlled enough that "start up the
>> workers" could be specified as "run this command line." We should make
>> this
>> environment message extensible to other alternatives than "docker
>> container
>> url," though of course we don't want the set of options to grow too large
>> or we loose the promise of portability unless every runner supports every
>> protocol.
>>
>>
> The pre-launched worker would be an interesting option, which might work
> well for a sidecar deployment.
>
> The current worker boot code though makes the assumption that the runner
> endpoint to phone home to is known when the process is launched. That
> doesn't work so well with a runner that establishes its endpoint
> dynamically. Also, the assumption is baked in that a worker will only serve
> a single pipeline (provisioning API etc.).
>
> Thanks,
> Thomas
>
>


Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-05-08 Thread Scott Wegner
Thanks for the update JB. Please open JIRA's for any Gradle release
blockers with as much context as you have.

On Tue, May 8, 2018 at 12:32 PM Jean-Baptiste Onofré 
wrote:

> Hi guys,
>
> new update on the 2.5.0 release preparation.
>
> I tested the artifacts published by gradle (using gradlew
> publishToLocalMaven) in "my" beam-samples (which still use the mvn).
>
> Unfortuntaly beam-samples project doesn't build as some artifacts seem
> to miss in the local repository.
> I'm also checking the generated maven coordinates generated (metadata,
> etc), and I'm not sure they are complete.
>
> On the other hand, I have build failure using gradle on python SDK (I
> think it's an environment issue due to the update to Ubuntu 18.04, I'm
> checking the python version, lint, ...) and go SDK (investigating).
>
> So, I need more time to completely review artifacts and build.
>
> I keep you posted.
>
> Regards
> JB
>
> On 06/04/2018 10:48, Jean-Baptiste Onofré wrote:
> > Hi guys,
> >
> > Apache Beam 2.4.0 has been released on March 20th.
> >
> > According to our cycle of release (roughly 6 weeks), we should think
> about 2.5.0.
> >
> > I'm volunteer to tackle this release.
> >
> > I'm proposing the following items:
> >
> > 1. We start the Jira triage now, up to Tuesday
> > 2. I would like to cut the release on Tuesday night (Europe time)
> > 2bis. I think it's wiser to still use Maven for this release. Do you
> think we
> > will be ready to try a release with Gradle ?
> >
> > After this release, I would like a discussion about:
> > 1. Gradle release (if we release 2.5.0 with Maven)
> > 2. Isolate release cycle per Beam part. I think it would be interesting
> to have
> > different release cycle: SDKs, DSLs, Runners, IOs. That's another
> discussion, I
> > will start a thread about that.
> >
> > Thoughts ?
> >
> > Regards
> > JB
> >
>


Re: Graal instead of docker?

2018-05-08 Thread Thomas Weise
On Sat, May 5, 2018 at 3:58 PM, Robert Bradshaw  wrote:

>
> I would welcome changes to
> https://github.com/apache/beam/blob/v2.4.0/model/
> pipeline/src/main/proto/beam_runner_api.proto#L730
> that would provide alternatives to docker (one of which comes to mind is "I
> already brought up a worker(s) for you (which could be the same process
> that handled pipeline construction in testing scenarios), here's how to
> connect to it/them.") Another option, which would seem to appeal to you in
> particular, would be "the worker code is linked into the runner's binary,
> use this process as the worker" (though note even for java-on-java, it can
> be advantageous to shield the worker and runner code from each others
> environments, dependencies, and version requirements.) This latter should
> still likely use the FnApi to talk to itself (either over GRPC on local
> ports, or possibly better via direct function calls eliminating the RPC
> overhead altogether--this is how the fast local runner in Python works).
> There may be runner environments well controlled enough that "start up the
> workers" could be specified as "run this command line." We should make this
> environment message extensible to other alternatives than "docker container
> url," though of course we don't want the set of options to grow too large
> or we loose the promise of portability unless every runner supports every
> protocol.
>
>
The pre-launched worker would be an interesting option, which might work
well for a sidecar deployment.

The current worker boot code though makes the assumption that the runner
endpoint to phone home to is known when the process is launched. That
doesn't work so well with a runner that establishes its endpoint
dynamically. Also, the assumption is baked in that a worker will only serve
a single pipeline (provisioning API etc.).

Thanks,
Thomas


Re: Graal instead of docker?

2018-05-08 Thread Eugene Kirpichov
On Tue, May 8, 2018 at 3:52 AM Romain Manni-Bucau 
wrote:

>
>
> Le mar. 8 mai 2018 10:16, Robert Bradshaw  a écrit :
>
>> On Sun, May 6, 2018 at 1:30 AM Romain Manni-Bucau 
>> wrote:
>>
>> > Wow, this mail should be on the website Robert, thanks for it
>>
>> > I still have a point to try to understand better: my view is that once
>> submitted the only perf related point is when you hit a flow of data. So a
>> split can be slow bit it is not a that big deal. So a runner integration
>> only needs to optimize process and nextElement logics, right?
>>
>> Yes. In some streaming cases (e.g. microbatch like Spark or Dataflow)
>> there
>> may be many, many bundles, so the "control plane" part can't be /too/
>> slow,
>> but it's not as performance critical.
>>
>> > It is almost always doable to batch that - with triggers and other
>> constraints. So the portable model is elegant but not done to be "fast" in
>> current state of impl.
>>
>> Actually batching and streaming RPCs for the data plane has been there
>> from
>> the start, for these reasons.
>>
>> > So this all leads to 2 needs:
>>
>> > 1. Have some native runner for dev
>> > 2. Have some bulk api for prod
>>
>> > In all cases this is decoralated of any runner no? Can even be a beam
>> subproject built on top of beam which would be very sane and ensure a
>> clear
>> separation of concerns no?
>>
>> The thing to do here would be to extend the Environment (message) to allow
>> for alternatives, and then abstract out the creation of an bundle executor
>> such that different once could be instantiated based on this environment.
>>
>
> Agree so we need a generic runner delegating to "subrunners" (or runner
> impl) instead of impl-ing it in all runners. Sounds very sane, scalable and
> extensible/composable this way.
>
> Can we mark it as a backlog item and goal?
>
> That's what java-fn-execution is doing, it's a library of various useful
things that different portable runners can utilize in case their control
code is written in Java - including e.g. intefacing with Docker

or
with something else.


>
>
>> > Le 6 mai 2018 00:59, "Robert Bradshaw"  a écrit :
>>
>> >> Portability, at its core, is providing a spec for any runner to talk to
>> any
>> >> SDK. I personally think it's done a great job in cleaning up the model
>> by
>> >> forcing us to define a clean boundary (as specified at
>> >> https://github.com/apache/beam/tree/master/model ) between these two
>> >> components (even if the implementations of one or the other are
>> >> (temporarily, I hope for the most part) complicated).The pipeline (on
>> the
>> >> runner submission side) and work execution (on what has traditionally
>> been
>> >> called the fn api side) have concrete platform-independent
>> descriptions,
>> >> rather than being a set of Java classes.
>>
>> >> Currently, the portion that lives on the "runner" side of this boundary
>> is
>> >> often shared among Java runners (via libraries like runners core), but
>> it
>> >> is all still part of each runner, and because of this it removes the
>> >> requirement for the Runner be Java just like it remove the requirement
>> for
>> >> the SDK to speak Java. (For example, I think a Python Dask runner
>> makes a
>> >> lot of sense, Dataflow may decide to implement larger portions of its
>> >> runner in Go or C++ or even behind a service, and I've used the Python
>> >> ULRunner to run the Java SDK over the Fn API for testing development
>> >> purposes).
>>
>> >> There is also the question of "why docker?" I actually don't see docker
>> all
>> >> that intrinsic to the protocol; one only needs to be able to define and
>> >> bring up workers that communicate on specified ports. Docker happens to
>> be
>> >> a fairly well supported way to package up an arbitrary chunk of code
>> (in
>> >> any language), together with its nearly arbitrarily specified
>> >> dependencies/environment, in a way that's well specified and easy to
>> start
>> >> up.
>>
>> >> I would welcome changes to
>>
>>
>> https://github.com/apache/beam/blob/v2.4.0/model/pipeline/src/main/proto/beam_runner_api.proto#L730
>> >> that would provide alternatives to docker (one of which comes to mind
>> is
>> "I
>> >> already brought up a worker(s) for you (which could be the same process
>> >> that handled pipeline construction in testing scenarios), here's how to
>> >> connect to it/them.") Another option, which would seem to appeal to you
>> in
>> >> particular, would be "the worker code is linked into the runner's
>> binary,
>> >> use this process as the worker" (though note even for java-on-java, it
>> can
>> >> be advantageous to shield the worker and runner code from each others
>> >> environments, dependencies, and version requirements.) This 

Re: Jenkins Post Commit Status to Github

2018-05-08 Thread Andrew Pilloud
Yep, mess with the groovy scripts appears to be the answer. We use
different jenkins libraries to handle PRs vs pushes to master but I think I
finally figured it out. Change is here:
https://github.com/apache/beam/pull/5305

Andrew

On Mon, May 7, 2018 at 12:40 PM Kenneth Knowles  wrote:

> I think you want to mess with the groovy scripts in .test-infra/jenkins
>
> Kenn
>
> On Mon, May 7, 2018 at 11:12 AM Andrew Pilloud 
> wrote:
>
>> The Github branches page shows the status of the latest commit on each
>> branch and provides a set of links to the jobs run on that commit. But it
>> doesn't appear Jenkins is publishing status from post commit jobs. This
>> seems like a simple oversight that should be easy to fix. Could someone
>> point me in the right direction to fix this?
>>
>> Andrew
>>
>


Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-05-08 Thread Jean-Baptiste Onofré

Hi guys,

new update on the 2.5.0 release preparation.

I tested the artifacts published by gradle (using gradlew 
publishToLocalMaven) in "my" beam-samples (which still use the mvn).


Unfortuntaly beam-samples project doesn't build as some artifacts seem 
to miss in the local repository.
I'm also checking the generated maven coordinates generated (metadata, 
etc), and I'm not sure they are complete.


On the other hand, I have build failure using gradle on python SDK (I 
think it's an environment issue due to the update to Ubuntu 18.04, I'm 
checking the python version, lint, ...) and go SDK (investigating).


So, I need more time to completely review artifacts and build.

I keep you posted.

Regards
JB

On 06/04/2018 10:48, Jean-Baptiste Onofré wrote:

Hi guys,

Apache Beam 2.4.0 has been released on March 20th.

According to our cycle of release (roughly 6 weeks), we should think about 
2.5.0.

I'm volunteer to tackle this release.

I'm proposing the following items:

1. We start the Jira triage now, up to Tuesday
2. I would like to cut the release on Tuesday night (Europe time)
2bis. I think it's wiser to still use Maven for this release. Do you think we
will be ready to try a release with Gradle ?

After this release, I would like a discussion about:
1. Gradle release (if we release 2.5.0 with Maven)
2. Isolate release cycle per Beam part. I think it would be interesting to have
different release cycle: SDKs, DSLs, Runners, IOs. That's another discussion, I
will start a thread about that.

Thoughts ?

Regards
JB



Re: Performance Testing - request for comments

2018-05-08 Thread Scott Wegner
A few thoughts:

1. Gradle can intelligently build only the dependencies necessary for a
task, so it shouldn't build all of Python if the test suite, if you only
specify the task you're interested in. I'm not sure of the command for
"build all of the dependencies of my tests but don't run my tests"; maybe
"./gradlew mytests -x mytests" ?

2. Some tasks in the build are not yet cacheable for various reasons. So
you may see them getting rebuilt on the second execution even on success,
which would then be included in your overall build timing. Information
about which tasks were used from the build cache is available in the Gradle
build scan (--scan).

Another idea for measuring the execution time of just your tests would be
to pull this out of Gradle's build report.  Adding the --profile flag
generates a report in $buildDir/reports/profile, which should have the
timing info for just the task you're interested in:
https://docs.gradle.org/current/userguide/command_line_interface.html

On Tue, May 8, 2018 at 8:23 AM Łukasz Gajowy 
wrote:

> Hi Beam Devs,
>
> currently PerfkitBenchmarker (a tool used to invoke performance tests) has
> two phases that run gradle commands:
>
>- Pre-build phase: this is where all the beam repo is build. This
>phase is to prepare the necessary artifacts so that it doesn't happen when
>executing tests.
>- Actual test running phase. After all necessary code is built we run
>the test and measure it's execution time. The execution time is displayed
>on the PerfKit dashboard [1].
>
> After the recent mvn - gradle migration we noticed that we are unable to
> "Pre build" the code[2]. Because one of the python related tasks fails, the
> whole "preBuild" phase fails silently and the actual building happens in
> the "test running" phase which increases the execution time (this is
> visible in the plots on the dashboard).
>
> This whole situation made me wonder about several things, and I'd like to
> ask you for opinions. I think:
>
>- we should skip all the python related tasks while building beam for
>java performance tests in PerfKit. Those are not needed anyway when we are
>running java tests. Is it possible to skip them in one go (eg. the same
>fashion we skip all checks using -xcheck option)?
>- the same goes for Python tests: we should skip all java related
>tasks when building beam for python performance tests in PerfKit. Note that
>this bullet is something to be developed in the future, as
>beam_PerformanceTests_Python job (the only Python Performance test job) is
>failing for 4 months now and seems abandoned. IMO it should be done when
>someone will bring the test back to life. For now the job should be
>disabled.
>- we should modify Perfkit so that when the prebuild phase fails for
>some reason, the test is not executed. Now we don't do this and the test
>execution time depends on whether "gradle integrationTest" command builds
>something or just runs the test. IMO when the command has to build anything
>the execution time should not be included in the Dashboards, because it's a
>false result.
>
> What do you think of all this?
>
> [1]
> https://apache-beam-testing.appspot.com/explore?dashboard=5755685136498688
> [2] https://issues.apache.org/jira/browse/BEAM-4256
>
> Best regards,
> Łukasz Gajowy
>
>
>
>


Performance Testing - request for comments

2018-05-08 Thread Łukasz Gajowy
Hi Beam Devs,

currently PerfkitBenchmarker (a tool used to invoke performance tests) has
two phases that run gradle commands:

   - Pre-build phase: this is where all the beam repo is build. This phase
   is to prepare the necessary artifacts so that it doesn't happen when
   executing tests.
   - Actual test running phase. After all necessary code is built we run
   the test and measure it's execution time. The execution time is displayed
   on the PerfKit dashboard [1].

After the recent mvn - gradle migration we noticed that we are unable to
"Pre build" the code[2]. Because one of the python related tasks fails, the
whole "preBuild" phase fails silently and the actual building happens in
the "test running" phase which increases the execution time (this is
visible in the plots on the dashboard).

This whole situation made me wonder about several things, and I'd like to
ask you for opinions. I think:

   - we should skip all the python related tasks while building beam for
   java performance tests in PerfKit. Those are not needed anyway when we are
   running java tests. Is it possible to skip them in one go (eg. the same
   fashion we skip all checks using -xcheck option)?
   - the same goes for Python tests: we should skip all java related tasks
   when building beam for python performance tests in PerfKit. Note that this
   bullet is something to be developed in the future, as
   beam_PerformanceTests_Python job (the only Python Performance test job) is
   failing for 4 months now and seems abandoned. IMO it should be done when
   someone will bring the test back to life. For now the job should be
   disabled.
   - we should modify Perfkit so that when the prebuild phase fails for
   some reason, the test is not executed. Now we don't do this and the test
   execution time depends on whether "gradle integrationTest" command builds
   something or just runs the test. IMO when the command has to build anything
   the execution time should not be included in the Dashboards, because it's a
   false result.

What do you think of all this?

[1]
https://apache-beam-testing.appspot.com/explore?dashboard=5755685136498688
[2] https://issues.apache.org/jira/browse/BEAM-4256

Best regards,
Łukasz Gajowy


Re: [DISCUSS] State of the project: Culture and governance

2018-05-08 Thread Kenneth Knowles
On Tue, Jan 23, 2018 at 8:44 AM Ismaël Mejía  wrote:

> - Clear guidelines for the criteria to earn commitership/PMC status.
>

The PMC discussed this quite a bit and we have now added this to the web
site: https://beam.apache.org/contribute/become-a-committer/

Kenn


Re: Apache Beam - jenkins question

2018-05-08 Thread Kamil Szewczyk
Hi, Jason

Sorry for late response I was on vacations. I would like to send messages
automatically to slack with Performance Analysis Daily Reports as described
in this https://github.com/apache/beam/pull/5180:
example report could be found on old apache beam slack
https://apachebeam.slack.com/messages/CAB3W69SS/ Those messages were sent
by me, and the missing thing is adding SLACK_WEBHOOK_URL which a token that
allows post messages to slack. I will send it to you in separate message.
So far I only have this token for an old apache beam slack generated, but
in order to migrate it to new slack only credential in jenkins ui will need
to be replaced. We can do it later, as I don't know who is responsible for
managing the-asf.slack.com and can help me with that.


2018-04-28 1:31 GMT+02:00 Jason Kuster :

> Thanks for the heads-up regarding the permissions. At this point I need
> more information about the credentials we want to use -- Kamil, can you
> provide more info? What is the purpose of the credentials you want to use
> here?
>
> On Fri, Apr 27, 2018 at 3:50 PM Davor Bonaci  wrote:
>
>> Jason, you should now have all the permissions needed. (You should,
>> however, evaluate whether this is a good place for it. Executors
>> themselves, for example, might be an alternative.)
>>
>> On Fri, Apr 27, 2018 at 7:42 PM, Jason Kuster 
>> wrote:
>>
>>> See https://github.com/apache/beam/blob/master/.test-infra/
>>> jenkins/common_job_properties.groovy#L119 for an example of this being
>>> done in practice to add the coveralls repo token as an environment variable.
>>>
>>> On Fri, Apr 27, 2018 at 12:41 PM Jason Kuster 
>>> wrote:
>>>
 Hi Kamil, Davor,

 I think what you want is the Jenkins secrets feature (see
 https://support.cloudbees.com/hc/en-us/articles/203802500-Injecting-
 Secrets-into-Jenkins-Build-Jobs). Davor, I believe you are the only
 one with enough karma on Jenkins to access the credentials UI; once the
 credential is created in Jenkins it should be able to be set as an
 environment variable through the Jenkins job configuration (groovy files in
 $BEAM_ROOT/.test-infra/jenkins). Hope this helps.

 Jason

 On Thu, Apr 26, 2018 at 8:43 PM Davor Bonaci  wrote:

> Hi Kamil --
> Thanks for reaching out.
>
> This is a great question for the dev@ mailing list. You may want to
> share a little bit more why you need, how long, frequency of updates to 
> the
> secret, etc. for the community to be aware how things work.
>
> Hopefully others on the mailing list can help you by manually putting
> the necessary secret into the cloud settings related to the executors.
>
> Davor
>
> -- Forwarded message --
> From: Kamil Szewczyk 
> Date: Tue, Apr 24, 2018 at 12:21 PM
> Subject: Apache Beam - jenkins question
> To: da...@apache.org
>
>
> Dear Davor
>
> I sent you a message on asf slack, wasn't sure how can I reach you.
>
> Anyway are you able to add secret (environment variable) to jenkins.
> ??
> Or point me to a person that would be able to do that ?
>
> Kind Regards
> Kamil Szewczyk
>
>

 --
 ---
 Jason Kuster
 Apache Beam / Google Cloud Dataflow

 See something? Say something. go/jasonkuster-feedback
 

>>>
>>>
>>> --
>>> ---
>>> Jason Kuster
>>> Apache Beam / Google Cloud Dataflow
>>>
>>> See something? Say something. go/jasonkuster-feedback
>>> 
>>>
>>
>>
>
> --
> ---
> Jason Kuster
> Apache Beam / Google Cloud Dataflow
>
> See something? Say something. go/jasonkuster-feedback
>


Re: Graal instead of docker?

2018-05-08 Thread Jean-Baptiste Onofré
It sounds reasonable to me and makes more sense.


Regards
JB

Le 8 mai 2018 à 12:53, à 12:53, Romain Manni-Bucau  a 
écrit:
>Le mar. 8 mai 2018 10:16, Robert Bradshaw  a écrit
>:
>
>> On Sun, May 6, 2018 at 1:30 AM Romain Manni-Bucau
>
>> wrote:
>>
>> > Wow, this mail should be on the website Robert, thanks for it
>>
>> > I still have a point to try to understand better: my view is that
>once
>> submitted the only perf related point is when you hit a flow of data.
>So a
>> split can be slow bit it is not a that big deal. So a runner
>integration
>> only needs to optimize process and nextElement logics, right?
>>
>> Yes. In some streaming cases (e.g. microbatch like Spark or Dataflow)
>there
>> may be many, many bundles, so the "control plane" part can't be /too/
>slow,
>> but it's not as performance critical.
>>
>> > It is almost always doable to batch that - with triggers and other
>> constraints. So the portable model is elegant but not done to be
>"fast" in
>> current state of impl.
>>
>> Actually batching and streaming RPCs for the data plane has been
>there from
>> the start, for these reasons.
>>
>> > So this all leads to 2 needs:
>>
>> > 1. Have some native runner for dev
>> > 2. Have some bulk api for prod
>>
>> > In all cases this is decoralated of any runner no? Can even be a
>beam
>> subproject built on top of beam which would be very sane and ensure a
>clear
>> separation of concerns no?
>>
>> The thing to do here would be to extend the Environment (message) to
>allow
>> for alternatives, and then abstract out the creation of an bundle
>executor
>> such that different once could be instantiated based on this
>environment.
>>
>
>Agree so we need a generic runner delegating to "subrunners" (or runner
>impl) instead of impl-ing it in all runners. Sounds very sane, scalable
>and
>extensible/composable this way.
>
>Can we mark it as a backlog item and goal?
>
>
>
>> > Le 6 mai 2018 00:59, "Robert Bradshaw"  a
>écrit :
>>
>> >> Portability, at its core, is providing a spec for any runner to
>talk to
>> any
>> >> SDK. I personally think it's done a great job in cleaning up the
>model
>> by
>> >> forcing us to define a clean boundary (as specified at
>> >> https://github.com/apache/beam/tree/master/model ) between these
>two
>> >> components (even if the implementations of one or the other are
>> >> (temporarily, I hope for the most part) complicated).The pipeline
>(on
>> the
>> >> runner submission side) and work execution (on what has
>traditionally
>> been
>> >> called the fn api side) have concrete platform-independent
>descriptions,
>> >> rather than being a set of Java classes.
>>
>> >> Currently, the portion that lives on the "runner" side of this
>boundary
>> is
>> >> often shared among Java runners (via libraries like runners core),
>but
>> it
>> >> is all still part of each runner, and because of this it removes
>the
>> >> requirement for the Runner be Java just like it remove the
>requirement
>> for
>> >> the SDK to speak Java. (For example, I think a Python Dask runner
>makes
>> a
>> >> lot of sense, Dataflow may decide to implement larger portions of
>its
>> >> runner in Go or C++ or even behind a service, and I've used the
>Python
>> >> ULRunner to run the Java SDK over the Fn API for testing
>development
>> >> purposes).
>>
>> >> There is also the question of "why docker?" I actually don't see
>docker
>> all
>> >> that intrinsic to the protocol; one only needs to be able to
>define and
>> >> bring up workers that communicate on specified ports. Docker
>happens to
>> be
>> >> a fairly well supported way to package up an arbitrary chunk of
>code (in
>> >> any language), together with its nearly arbitrarily specified
>> >> dependencies/environment, in a way that's well specified and easy
>to
>> start
>> >> up.
>>
>> >> I would welcome changes to
>>
>>
>>
>https://github.com/apache/beam/blob/v2.4.0/model/pipeline/src/main/proto/beam_runner_api.proto#L730
>> >> that would provide alternatives to docker (one of which comes to
>mind is
>> "I
>> >> already brought up a worker(s) for you (which could be the same
>process
>> >> that handled pipeline construction in testing scenarios), here's
>how to
>> >> connect to it/them.") Another option, which would seem to appeal
>to you
>> in
>> >> particular, would be "the worker code is linked into the runner's
>> binary,
>> >> use this process as the worker" (though note even for
>java-on-java, it
>> can
>> >> be advantageous to shield the worker and runner code from each
>others
>> >> environments, dependencies, and version requirements.) This latter
>> should
>> >> still likely use the FnApi to talk to itself (either over GRPC on
>local
>> >> ports, or possibly better via direct function calls eliminating
>the RPC
>> >> overhead altogether--this is how the fast local runner in Python
>works).
>> >> There may be runner environments well controlled enough that

Re: Graal instead of docker?

2018-05-08 Thread Romain Manni-Bucau
Le mar. 8 mai 2018 10:16, Robert Bradshaw  a écrit :

> On Sun, May 6, 2018 at 1:30 AM Romain Manni-Bucau 
> wrote:
>
> > Wow, this mail should be on the website Robert, thanks for it
>
> > I still have a point to try to understand better: my view is that once
> submitted the only perf related point is when you hit a flow of data. So a
> split can be slow bit it is not a that big deal. So a runner integration
> only needs to optimize process and nextElement logics, right?
>
> Yes. In some streaming cases (e.g. microbatch like Spark or Dataflow) there
> may be many, many bundles, so the "control plane" part can't be /too/ slow,
> but it's not as performance critical.
>
> > It is almost always doable to batch that - with triggers and other
> constraints. So the portable model is elegant but not done to be "fast" in
> current state of impl.
>
> Actually batching and streaming RPCs for the data plane has been there from
> the start, for these reasons.
>
> > So this all leads to 2 needs:
>
> > 1. Have some native runner for dev
> > 2. Have some bulk api for prod
>
> > In all cases this is decoralated of any runner no? Can even be a beam
> subproject built on top of beam which would be very sane and ensure a clear
> separation of concerns no?
>
> The thing to do here would be to extend the Environment (message) to allow
> for alternatives, and then abstract out the creation of an bundle executor
> such that different once could be instantiated based on this environment.
>

Agree so we need a generic runner delegating to "subrunners" (or runner
impl) instead of impl-ing it in all runners. Sounds very sane, scalable and
extensible/composable this way.

Can we mark it as a backlog item and goal?



> > Le 6 mai 2018 00:59, "Robert Bradshaw"  a écrit :
>
> >> Portability, at its core, is providing a spec for any runner to talk to
> any
> >> SDK. I personally think it's done a great job in cleaning up the model
> by
> >> forcing us to define a clean boundary (as specified at
> >> https://github.com/apache/beam/tree/master/model ) between these two
> >> components (even if the implementations of one or the other are
> >> (temporarily, I hope for the most part) complicated).The pipeline (on
> the
> >> runner submission side) and work execution (on what has traditionally
> been
> >> called the fn api side) have concrete platform-independent descriptions,
> >> rather than being a set of Java classes.
>
> >> Currently, the portion that lives on the "runner" side of this boundary
> is
> >> often shared among Java runners (via libraries like runners core), but
> it
> >> is all still part of each runner, and because of this it removes the
> >> requirement for the Runner be Java just like it remove the requirement
> for
> >> the SDK to speak Java. (For example, I think a Python Dask runner makes
> a
> >> lot of sense, Dataflow may decide to implement larger portions of its
> >> runner in Go or C++ or even behind a service, and I've used the Python
> >> ULRunner to run the Java SDK over the Fn API for testing development
> >> purposes).
>
> >> There is also the question of "why docker?" I actually don't see docker
> all
> >> that intrinsic to the protocol; one only needs to be able to define and
> >> bring up workers that communicate on specified ports. Docker happens to
> be
> >> a fairly well supported way to package up an arbitrary chunk of code (in
> >> any language), together with its nearly arbitrarily specified
> >> dependencies/environment, in a way that's well specified and easy to
> start
> >> up.
>
> >> I would welcome changes to
>
>
> https://github.com/apache/beam/blob/v2.4.0/model/pipeline/src/main/proto/beam_runner_api.proto#L730
> >> that would provide alternatives to docker (one of which comes to mind is
> "I
> >> already brought up a worker(s) for you (which could be the same process
> >> that handled pipeline construction in testing scenarios), here's how to
> >> connect to it/them.") Another option, which would seem to appeal to you
> in
> >> particular, would be "the worker code is linked into the runner's
> binary,
> >> use this process as the worker" (though note even for java-on-java, it
> can
> >> be advantageous to shield the worker and runner code from each others
> >> environments, dependencies, and version requirements.) This latter
> should
> >> still likely use the FnApi to talk to itself (either over GRPC on local
> >> ports, or possibly better via direct function calls eliminating the RPC
> >> overhead altogether--this is how the fast local runner in Python works).
> >> There may be runner environments well controlled enough that "start up
> the
> >> workers" could be specified as "run this command line." We should make
> this
> >> environment message extensible to other alternatives than "docker
> container
> >> url," though of course we don't want the set of options to grow too
> large
> >> or we loose the promise of 

Re: Graal instead of docker?

2018-05-08 Thread Robert Bradshaw
On Sun, May 6, 2018 at 1:30 AM Romain Manni-Bucau 
wrote:

> Wow, this mail should be on the website Robert, thanks for it

> I still have a point to try to understand better: my view is that once
submitted the only perf related point is when you hit a flow of data. So a
split can be slow bit it is not a that big deal. So a runner integration
only needs to optimize process and nextElement logics, right?

Yes. In some streaming cases (e.g. microbatch like Spark or Dataflow) there
may be many, many bundles, so the "control plane" part can't be /too/ slow,
but it's not as performance critical.

> It is almost always doable to batch that - with triggers and other
constraints. So the portable model is elegant but not done to be "fast" in
current state of impl.

Actually batching and streaming RPCs for the data plane has been there from
the start, for these reasons.

> So this all leads to 2 needs:

> 1. Have some native runner for dev
> 2. Have some bulk api for prod

> In all cases this is decoralated of any runner no? Can even be a beam
subproject built on top of beam which would be very sane and ensure a clear
separation of concerns no?

The thing to do here would be to extend the Environment (message) to allow
for alternatives, and then abstract out the creation of an bundle executor
such that different once could be instantiated based on this environment.

> Le 6 mai 2018 00:59, "Robert Bradshaw"  a écrit :

>> Portability, at its core, is providing a spec for any runner to talk to
any
>> SDK. I personally think it's done a great job in cleaning up the model by
>> forcing us to define a clean boundary (as specified at
>> https://github.com/apache/beam/tree/master/model ) between these two
>> components (even if the implementations of one or the other are
>> (temporarily, I hope for the most part) complicated).The pipeline (on the
>> runner submission side) and work execution (on what has traditionally
been
>> called the fn api side) have concrete platform-independent descriptions,
>> rather than being a set of Java classes.

>> Currently, the portion that lives on the "runner" side of this boundary
is
>> often shared among Java runners (via libraries like runners core), but it
>> is all still part of each runner, and because of this it removes the
>> requirement for the Runner be Java just like it remove the requirement
for
>> the SDK to speak Java. (For example, I think a Python Dask runner makes a
>> lot of sense, Dataflow may decide to implement larger portions of its
>> runner in Go or C++ or even behind a service, and I've used the Python
>> ULRunner to run the Java SDK over the Fn API for testing development
>> purposes).

>> There is also the question of "why docker?" I actually don't see docker
all
>> that intrinsic to the protocol; one only needs to be able to define and
>> bring up workers that communicate on specified ports. Docker happens to
be
>> a fairly well supported way to package up an arbitrary chunk of code (in
>> any language), together with its nearly arbitrarily specified
>> dependencies/environment, in a way that's well specified and easy to
start
>> up.

>> I would welcome changes to

https://github.com/apache/beam/blob/v2.4.0/model/pipeline/src/main/proto/beam_runner_api.proto#L730
>> that would provide alternatives to docker (one of which comes to mind is
"I
>> already brought up a worker(s) for you (which could be the same process
>> that handled pipeline construction in testing scenarios), here's how to
>> connect to it/them.") Another option, which would seem to appeal to you
in
>> particular, would be "the worker code is linked into the runner's binary,
>> use this process as the worker" (though note even for java-on-java, it
can
>> be advantageous to shield the worker and runner code from each others
>> environments, dependencies, and version requirements.) This latter should
>> still likely use the FnApi to talk to itself (either over GRPC on local
>> ports, or possibly better via direct function calls eliminating the RPC
>> overhead altogether--this is how the fast local runner in Python works).
>> There may be runner environments well controlled enough that "start up
the
>> workers" could be specified as "run this command line." We should make
this
>> environment message extensible to other alternatives than "docker
container
>> url," though of course we don't want the set of options to grow too large
>> or we loose the promise of portability unless every runner supports every
>> protocol.

>> Of course, the runner is always free to execute any Fn for which it
>> completely understands the URN and the environment any way it pleases,
e.g.
>> directly in process, or even via lighter-weight mechanism like Jython or
>> Graal, rather than asking an external process to do it. But we need a
>> lowest common denominator for executing arbitrary URNs runners are not
>> expected to understand.

>> As an aside, there are also technical 

Build failed in Jenkins: beam_Release_Gradle_NightlySnapshot #32

2018-05-08 Thread Apache Jenkins Server
See 


Changes:

[thw] [BEAM-4246] Remove Maven build directory dependency in Apex runner.

[thw] Upgrade Apex version to 3.7.0 to get rid of poisened netlet 1.3.0

[daniel.o.programmer] Change to docstring in filesystem.py

[github] Moving Python PostCommit into a Gradle task. (#5289)

--
[...truncated 2.51 MB...]
:
 warning: Cannot find annotation method 'value()' in type 'DefaultAnnotation'
:
 warning: Cannot find annotation method 'value()' in type 'DefaultAnnotation'
:174:
 warning: [NullablePrimitive] @Nullable should not be used for primitive types 
since they cannot be null
  @Nullable
  ^
(see http://errorprone.info/bugpattern/NullablePrimitive)
  Did you mean to remove this line?
:180:
 warning: [NullablePrimitive] @Nullable should not be used for primitive types 
since they cannot be null
  @Nullable
  ^
(see http://errorprone.info/bugpattern/NullablePrimitive)
  Did you mean to remove this line?
:186:
 warning: [NullablePrimitive] @Nullable should not be used for primitive types 
since they cannot be null
  @Nullable
  ^
(see http://errorprone.info/bugpattern/NullablePrimitive)
  Did you mean to remove this line?
:284:
 warning: [NullablePrimitive] @Nullable should not be used for primitive types 
since they cannot be null
  @Nullable
  ^
(see http://errorprone.info/bugpattern/NullablePrimitive)
  Did you mean to remove this line?
:92:
 warning: [ImmutableEnumChecker] enums should be immutable: 'NexmarkSuite' has 
field 'configurations' of type 
'java.util.List', 'List' is 
mutable
  private final List configurations;
   ^
(see http://errorprone.info/bugpattern/ImmutableEnumChecker)
:40:
 warning: [ImmutableEnumChecker] enums should be immutable: 'Tag' has non-final 
field 'value'
private int value = -1;
^
(see http://errorprone.info/bugpattern/ImmutableEnumChecker)
  Did you mean 'private final int value = -1;'?
:
 warning: Cannot find annotation method 'value()' in type 'DefaultAnnotation'
:41:
 warning: [MutableConstantField] Constant field declarations should use the 
immutable type (such as ImmutableList) instead of the general collection 
interface type (such as List)
  public static final Map ADAPTERS =
 ^
(see http://errorprone.info/bugpattern/MutableConstantField)
  Did you mean 'public static final ImmutableMap 
ADAPTERS ='?
:
 warning: Cannot find annotation method 'value()' in type 'DefaultAnnotation'
:131:
 warning: [IntLongMath] Expression of type int may overflow before being 
assigned to a long
/