Re: Pipeline options validation

2019-04-29 Thread Kenneth Knowles
Does it make use of the @Nullable annotation or just assume any object reference could be null? Now that we are on Java 8 can it use Optional as well? (pet issue of mine :-) On Mon, Apr 29, 2019 at 5:29 PM Lukasz Cwik wrote: > The original ask for having the ability to introspect whether a

Re: Pipeline options validation

2019-04-29 Thread Lukasz Cwik
The original ask for having the ability to introspect whether a field is set or not was in BEAM-2261 and it was to improve the logic around default values. I filed BEAM-7180 for making validation check if the field is set or not vs the current comparison which is null or not. On Mon, Apr 29,

Re: Pipeline options validation

2019-04-29 Thread Lukasz Cwik
Kyle your right and it makes sense from the doc but from a user point of view the validation is really asking if the field has been set or not. Differentiation between unset and set has come up in the past for PipelineOptions. On Mon, Apr 29, 2019 at 5:19 PM Kyle Weaver wrote: >

Re: Pipeline options validation

2019-04-29 Thread Kyle Weaver
Validation.Required: "This criteria specifies that the value must be not null. Note that this annotation should only be applied to methods that return nullable objects." [1] My guess is you should probably try the Integer class instead. [1]

Pipeline options validation

2019-04-29 Thread Ning Wang
Hi, Beam devs, I am working on a runner and found something not working as expected. I have this field in my H*PipelineOptions, ``` @Description("Number of Containers") @Validation.Required int getNumberOfContainers(); void setNumberOfContainers(int value); ``` and I am calling this

Re: [DISCUSS] Performance of Beam compare to "Bare Runner"

2019-04-29 Thread Kenneth Knowles
Specifically, a lot of shared code assumes that repeatedly setting a timer is nearly free / the same cost as determining whether or not to set the timer. ReduceFnRunner has been refactored in a way so it would be very easy to set the GC timer once per window that occurs in a bundle, but there's

Re: [discuss] A tweak to the Python API for SDF?

2019-04-29 Thread Lukasz Cwik
Pablo, all the SplittableDoFn stuff is marked as @Experimental so one is able to change it. There really is only one complicated one to change in Watch.java, the rest are quite straightforward. On Mon, Apr 29, 2019 at 2:23 PM Pablo Estrada wrote: > Thanks all, > @Luke - I imagine that would be

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-04-29 Thread Pablo Estrada
Aw that's interesting! I think, with these considerations, I am only marginally more inclined towards publishing to test.pypi. That would make me a +0.9 on publishing RCs to the main pip repo then. Thanks for doing the research Ahmet. :) Best -P On Mon, Apr 29, 2019 at 1:53 PM Ahmet Altay

Re: [discuss] A tweak to the Python API for SDF?

2019-04-29 Thread Pablo Estrada
Thanks all, @Luke - I imagine that would be an improvement to the API, but this may be harder as this is already available to users, and there are those who have implemented SDFs under the current API. Would it be possible to make a backwards-compatible change to the API here? For the Python

Re: :beam-sdks-java-io-hadoop-input-format:test is extremely flaky

2019-04-29 Thread Reuven Lax
yeah, that testClientConnecting test is also extremely flaky. On Mon, Apr 29, 2019 at 6:50 AM Jean-Baptiste Onofré wrote: > Agree, +1 > > Regards > JB > > On 29/04/2019 15:30, Ismaël Mejía wrote: > > +1 to remove it on this release, this is a maintenance pain for no real > reason. > > > > On

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-04-29 Thread Ahmet Altay
I asked to Airflow folks about this. See [1] for the full response and a link to one of their RC emails. To summarize their position (specifically for pypi) is: Unless a user does something explicit (such as using a flag, or explicitly requesting an rc release), pip install will not serve RC

Re: Enable security for data channels in portability

2019-04-29 Thread Lukasz Cwik
Changing the address to be loopback based upon how the environment is started (docker container/process/external/...) makes sense. How would the SDK and runner support storing/sharing this secret? (For example, in the docker container, how would the secret get there?) On Mon, Apr 29, 2019 at

Tip: Search through Beam mailing lists using a custom search engine.

2019-04-29 Thread Valentyn Tymofieiev
Custom search URLs: Dev: https://lists.apache.org/list.html?dev@beam.apache.org:lte=99M:%s User: https://lists.apache.org/list.html?u...@beam.apache.org:lte=99M:%s How to add a custom search engine in Google Chrome: https://support.google.com/chrome/answer/95426

Re: [BEAM-7164] Python precommit failing on Java PRs. dataflow:setupVirtualenv

2019-04-29 Thread Udi Meiri
Pip has a --cache-dir which should be safe with concurrent writes. On Fri, Apr 26, 2019 at 3:59 PM Ahmet Altay wrote: > It is possible to download dependencies with pip to a local directory and > install from there [1]. As a side benefit this is supposed to speed up the > installation process.

Re: [DISCUSS] Performance of Beam compare to "Bare Runner"

2019-04-29 Thread Reuven Lax
I think the short answer is that folks working on the BeamFlink runner have mostly been focused on getting everything working, and so have not dug into this performance too deeply. I suspect that there is low-hanging fruit to optimize as a result. You're right that ReduceFnRunner schedules a

Re: Updates on Beam Jenkins

2019-04-29 Thread Alan Myrvold
Thanks for this work, Yifan! On Mon, Apr 29, 2019 at 8:14 AM Ismaël Mejía wrote: > Thanks Yifan for all your work. Sometimes the work on infrastructure > is hidden, so it is great to acknowledge the importance of the > improvements you and the others have done. > > On Mon, Apr 29, 2019 at 5:11

Re: [VOTE] Release 2.12.0, release candidate #4

2019-04-29 Thread Connell O'Callaghan
Thank you Andrew!!! On Mon, Apr 29, 2019 at 9:21 AM Andrew Pilloud wrote: > Yes, they were moved and are now at > https://dist.apache.org/repos/dist/release/beam/2.12.0/ > > On Fri, Apr 26, 2019 at 2:02 AM Robert Bradshaw > wrote: > > > > Thanks for all the hard work! > > > >

Re: [DISCUSS] Performance of Beam compare to "Bare Runner"

2019-04-29 Thread Maximilian Michels
Hi Jozef, Yes there is potential for overhead with running Beam pipelines on different Runners. The Beam model has an execution framework which each Runner utilizes in a slightly different way. Timers in Flink, for example, are uniquely identified by a namespace and a timestamp. In Beam,

Re: Enable security for data channels in portability

2019-04-29 Thread Hai Lu
Hi Lukasz and Ankur, Thank you so much for your response! This is what we're doing/implementing in our internal fork right now: 1. We assume that the Java process and Python process *are always colocated in the same host*, so first of all we use "loopback" address instead of "any

Re: [VOTE] Release 2.12.0, release candidate #4

2019-04-29 Thread Andrew Pilloud
Yes, they were moved and are now at https://dist.apache.org/repos/dist/release/beam/2.12.0/ On Fri, Apr 26, 2019 at 2:02 AM Robert Bradshaw wrote: > > Thanks for all the hard work! > > https://dist.apache.org/repos/dist/dev/beam/2.12.0/ seems empty; were > the artifacts already moved? > > On

Re: [discuss] A tweak to the Python API for SDF?

2019-04-29 Thread Lukasz Cwik
Would it make sense to also do this in the Java SDK? The would make the restriction provider also mirror the TimerSpec and StateSpec which use annotations similar to how its done in Python. On Mon, Apr 29, 2019 at 3:42 AM Robert Bradshaw wrote: > +1 to introducing this Param for consistency

Re: Updates on Beam Jenkins

2019-04-29 Thread Ismaël Mejía
Thanks Yifan for all your work. Sometimes the work on infrastructure is hidden, so it is great to acknowledge the importance of the improvements you and the others have done. On Mon, Apr 29, 2019 at 5:11 PM Lukasz Cwik wrote: > > Thanks Yifan for driving this. > > On Mon, Apr 29, 2019 at 8:01 AM

Re: Updates on Beam Jenkins

2019-04-29 Thread Lukasz Cwik
Thanks Yifan for driving this. On Mon, Apr 29, 2019 at 8:01 AM Yifan Zou wrote: > Hi all, > > > We now fully switched the Jenkins to new agents. The old agents are > deprecated and VMs will be deleted shortly to make more CPU available in > the us-central1 for tests. Please let me know if you

Re: :beam-sdks-java-io-hadoop-input-format:test is extremely flaky

2019-04-29 Thread Jean-Baptiste Onofré
Agree, +1 Regards JB On 29/04/2019 15:30, Ismaël Mejía wrote: > +1 to remove it on this release, this is a maintenance pain for no real > reason. > > On Mon, Apr 29, 2019 at 3:06 PM Alexey Romanenko > wrote: >> >> Despite the fact that after fixing an issue with ports allocation (thanks to

Re: :beam-sdks-java-io-hadoop-input-format:test is extremely flaky

2019-04-29 Thread Ismaël Mejía
+1 to remove it on this release, this is a maintenance pain for no real reason. On Mon, Apr 29, 2019 at 3:06 PM Alexey Romanenko wrote: > > Despite the fact that after fixing an issue with ports allocation (thanks to > Etienne!) for embedded Cassandra cluster (it’s used in hadoop-input-format

Re: :beam-sdks-java-io-hadoop-input-format:test is extremely flaky

2019-04-29 Thread Alexey Romanenko
Despite the fact that after fixing an issue with ports allocation (thanks to Etienne!) for embedded Cassandra cluster (it’s used in hadoop-input-format and this was the main cause of flakiness) it's got much better, I’m 100% pro to remove this module since it’s already been deprecated for

Beam Dependency Check Report (2019-04-29)

2019-04-29 Thread Apache Jenkins Server
ERROR: File 'src/build/dependencyUpdates/beam-dependency-check-report.html' does not exist

Re: :beam-sdks-java-io-hadoop-input-format:test is extremely flaky

2019-04-29 Thread Maximilian Michels
I don't know what going on with it but I agree it's annoying. Came across https://jira.apache.org/jira/browse/BEAM-6247, maybe it is time to remove this module for the next release? -Max On 26.04.19 20:10, Reuven Lax wrote: I find I usually have to rerun Presubmit multiple times to get a

Re: [discuss] A tweak to the Python API for SDF?

2019-04-29 Thread Robert Bradshaw
+1 to introducing this Param for consistency (and making the substitution more obvious), and I think SDF is still new/experimental enough we can do this. I don't know if we need Spec in addition to Param and Provider. On Sat, Apr 27, 2019 at 1:07 AM Chamikara Jayalath wrote: > > > > On Fri, Apr

[DISCUSS] Performance of Beam compare to "Bare Runner"

2019-04-29 Thread Jozef Vilcek
Hello, I am interested in any knowledge or thoughts on what should be / is an overhead of running Beam pipelines instead of pipelines written on "bare runner". Is this something which is being tested or investigated by community? Is there a consensus in what bounds should the overhead typically

Re: [DISCUSS] Reconciling ValueState in Java and Python (was: [docs] Python State & Timers)

2019-04-29 Thread Robert Bradshaw
On Mon, Apr 29, 2019 at 3:43 AM Reza Rokni wrote: > > @Robert Bradshaw Some examples, mostly built out from cases around Timeseries > data, don't want to derail this thread so at a hi level : Thanks. Perfectly on-topic for the thread. > Looping timers, a timer which allows for creation of a