Jenkins build is back to stable : beam_Release_NightlySnapshot #300

2017-01-17 Thread Apache Jenkins Server
See

Re: [DISCUSS] Python SDK status and next steps

2017-01-17 Thread Sergio Fernández
Hi On Tue, Jan 17, 2017 at 5:22 PM, Ahmet Altay wrote: > > tl;dr: I would like to start a discussion about merging python-sdk branch > to master branch. Python SDK is mature enough and merging it to master will > accelerate its development and adoption. > Good point, Ahmet! I've following close

Re: Build failed in Jenkins: beam_PostCommit_Java_RunnableOnService_Dataflow #2046

2017-01-17 Thread Jason Kuster
I've seen a number of these marked-as-failures-but-actually-successes builds recently. Has something changed? It looks like Jenkins may be failing to parse the Maven output; it reported that the Runners :: Google Cloud Dataflow module took 0ms to run. On Tue, Jan 17, 2017 at 11:12 PM, Apache Jenki

Re: why SimpleDoFnRunner can not support setup

2017-01-17 Thread Kenneth Knowles
Yes, anything that processes bundles of elements could implement that interface, while a DoFn actually has many more capabilities (side inputs, state, setup, teardown). If you want to discuss less generally, another way to explain this in terms of DoFn is that @Setup should happen before the DoFn

Re: [DISCUSS] Python SDK status and next steps

2017-01-17 Thread Frances Perry
+1 merged after 0.5. It's on a great trajectory in terms of development and community. On Tue, Jan 17, 2017 at 5:48 PM, Kenneth Knowles wrote: > Seems reasonable, and the timeline Davor suggests makes a lot of sense. > > On Tue, Jan 17, 2017 at 3:59 PM, Lukasz Cwik > wrote: > > > I'm also for

Re: why SimpleDoFnRunner can not support setup

2017-01-17 Thread Manu Zhang
Hi, As commented here by Kenn( +k...@google.com), the name DoFnRunner is misleading and could have nothing to do with DoFn. It's used for bundle processing(startBundle, processElement, stopBundle) so there are no setup and teardown

Re: [DISCUSS] Python SDK status and next steps

2017-01-17 Thread Kenneth Knowles
Seems reasonable, and the timeline Davor suggests makes a lot of sense. On Tue, Jan 17, 2017 at 3:59 PM, Lukasz Cwik wrote: > I'm also for merging to master. > > On Tue, Jan 17, 2017 at 3:39 PM, Jean-Baptiste Onofré > wrote: > > > It makes sense to merge after 0.5.0 release. > > > > Good point

Re: Hosting data stores for IO Transform testing

2017-01-17 Thread Stephen Sisk
hi! I've been continuing this investigation, and have some more info to report, and hopefully we can start making some decisions. To support performance testing, I've been investigating mesos+marathon and kubernetes for running data stores in their high availability mode. I have been examining fe

Re: IO Integration tests - concrete proposal

2017-01-17 Thread Lukasz Cwik
Since docker containers can run a script on startup, can we embed the initial data set into that script/container build so that the same docker container and initial data set can be used across multiple ITs. For example, if Python and Java both have JdbcIO, it would be nice if they could leverage t

IO Integration tests - concrete proposal

2017-01-17 Thread Stephen Sisk
Hi all! As I've discussed previously on this list[1], ensuring that we have high quality IO Transforms is important to beam. We want to do this without adding too much burden on developers wanting to contribute. Below I have a concrete proposal for what an IO integration test would look like and a

Re: Composite Types and the Runner API

2017-01-17 Thread Lukasz Cwik
+1 since this brings us closer to a portability story. On Tue, Jan 17, 2017 at 3:10 PM, Jean-Baptiste Onofré wrote: > +1 > > It makes sense. > > Thanks ! > Regards > JB > > > On 01/17/2017 10:46 AM, Thomas Groh wrote: > >> Hey everyone; >> >> I've been working on parts of the runner API recently

Re: [DISCUSS] Python SDK status and next steps

2017-01-17 Thread Lukasz Cwik
I'm also for merging to master. On Tue, Jan 17, 2017 at 3:39 PM, Jean-Baptiste Onofré wrote: > It makes sense to merge after 0.5.0 release. > > Good point Davor: +1 > > Regards > JB > > > On 01/17/2017 03:34 PM, Davor Bonaci wrote: > >> +1. I think merging to master would be an awesome next step

Re: [DISCUSS] Python SDK status and next steps

2017-01-17 Thread Jean-Baptiste Onofré
It makes sense to merge after 0.5.0 release. Good point Davor: +1 Regards JB On 01/17/2017 03:34 PM, Davor Bonaci wrote: +1. I think merging to master would be an awesome next step for the Python SDK. And, thanks for a great summary of the current state, roadmap, and impact to the project as

Re: [DISCUSS] Python SDK status and next steps

2017-01-17 Thread Davor Bonaci
+1. I think merging to master would be an awesome next step for the Python SDK. And, thanks for a great summary of the current state, roadmap, and impact to the project as a whole -- awesome! Process-wise, I'd suggest starting a formal vote once this discussion seems to be trending towards a conc

Re: Composite Types and the Runner API

2017-01-17 Thread Jean-Baptiste Onofré
+1 It makes sense. Thanks ! Regards JB On 01/17/2017 10:46 AM, Thomas Groh wrote: Hey everyone; I've been working on parts of the runner API recently, and part of that has included a shift of how composite inputs and outputs must be represented by the time a PipelineRunner begins to access th

Re: Build failed in Jenkins: beam_PostCommit_Java_MavenInstall #2365

2017-01-17 Thread Jason Kuster
Manual build, screwing with settings, safe to ignore. On Tue, Jan 17, 2017 at 2:05 PM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See > > > -- > [...truncated 145 line

Re: Composite Types and the Runner API

2017-01-17 Thread Robert Bradshaw
The need for composite PValues is closely tied to the language of the SDK, but not an intrinsic part of the to-be-executed pipeline (e.g. Python doesn't have PCollectionList because a list of PCollections does the job just fine (better, in fact), but Java needs it to preserve static typing). So +

Re: Composite Types and the Runner API

2017-01-17 Thread Amit Sela
This looks pretty straight-forward, let me know if you need anything in the Spark runner front here. +1 On Tue, Jan 17, 2017 at 9:13 PM Kenneth Knowles wrote: > This is a nice concrete (and inevitable) step towards our SDK-agnostic > pipeline representation. > > +1 from me! > > On Tue, Jan 17, 2

Re: @ProcessElement and private methods

2017-01-17 Thread Lukasz Cwik
Requiring public is also a form of documentation for implementers and maintainers that this method is meant to be called by someone else. All our arguments have been around best practices. I think the only technical argument is that if a runner today chooses to execute with a security manager, it

Re: Watermark reading API

2017-01-17 Thread Lukasz Cwik
For a pipeline that only uses data driven triggers, many of our supported use cases like window expiry calculation for GC and side input readiness fall apart and we have become reliant on a single monotonically increasing value. On Tue, Jan 17, 2017 at 11:04 AM, Kenneth Knowles wrote: > On Fri,

Re: Composite Types and the Runner API

2017-01-17 Thread Kenneth Knowles
This is a nice concrete (and inevitable) step towards our SDK-agnostic pipeline representation. +1 from me! On Tue, Jan 17, 2017 at 10:46 AM, Thomas Groh wrote: > Hey everyone; > > I've been working on parts of the runner API recently, and part of that has > included a shift of how composite in

Re: Watermark reading API

2017-01-17 Thread Kenneth Knowles
On Fri, Jan 13, 2017 at 5:28 PM, Robert Bradshaw < rober...@google.com.invalid> wrote: > > I don't think the watermark itself is even really part of the Beam > model, and other runners may implement things differently. > This is an interesting perspective. I think it might be a bold claim about a

Composite Types and the Runner API

2017-01-17 Thread Thomas Groh
Hey everyone; I've been working on parts of the runner API recently, and part of that has included a shift of how composite inputs and outputs must be represented by the time a PipelineRunner begins to access them. I have a PR that completes this work within the Java SDK, but wanted to ensure that

Re: @ProcessElement and private methods

2017-01-17 Thread Stas Levin
We stumbled across this issue when we were doing some Scala-Java interop and had an anonymous class that was supposed to have an element processing method. The Scala compiler had decided to make that method private, and so it was not visible to Beam. While extracting this class to an upper level (i

Re: On my activity at the project

2017-01-17 Thread Kenneth Knowles
Great to work with you so far, and looking forward to it in the future. Enjoy your time off! Kenn On Sat, Jan 14, 2017 at 12:04 AM, Maximilian Michels wrote: > Dear Beamers, > > Thank you for the past year where we built this amazing community! It's > been exciting times. > > For the beginning

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-17 Thread Kenneth Knowles
This ticket is still relevant. Response inline. On Tue, Jan 17, 2017 at 8:52 AM, Jean-Baptiste Onofré wrote: > OK, but I'm afraid we would be too specific. If the batching is just a > List or Set that we populate in @ProcessElement and flush in > @FinishBundle (or when we raise a limit), I don't

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-17 Thread Etienne Chauchot
I agree, not a lot of value, it is just a little utility function to avoid users to code it themselves and to avoid missCode (forget to flush batch in finishBundle for ex as a user was saying on stackoverflow). I guess Ben created the ticket because some users were searching for that kind of fu

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-17 Thread Ben Chambers
We should start by understanding the goals. If elements are in different windows can they be out in the same batch? If they have different timestamps what timestamp should the batch have? As a composite transform this will likely require a group by key which may affect performance. Maybe within a

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-17 Thread Jean-Baptiste Onofré
OK, but I'm afraid we would be too specific. If the batching is just a List or Set that we populate in @ProcessElement and flush in @FinishBundle (or when we raise a limit), I don't know if it brings lot of value. People might wants a more fine-grained logic (based on the type in the list for

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-17 Thread Etienne Chauchot
Hi JB, I meant jira vote but discussion on the ML works also :) As I understand the need (see stackoverflow links in jira ticket) the aim is to avoid the user having to code the batching logic in his own DoFn.processElement() and DoFn.finishBundle() regardless of the bundles. For example, pos

Re: @ProcessElement and private methods

2017-01-17 Thread Ben Chambers
We thought about it when this was added. We decided against it because these are overrides-in-spirit. Letting them be private would be misleading because they are called from outside the class and should be thought of in that way. Also, this seems similar to others in the Java ecosystem: JUnit tes

Re: @ProcessElement and private methods

2017-01-17 Thread Jean-Baptiste Onofré
Hi Stas Just curious: what would be the use case of private method ? Being able to reuse the DoFn class in different contexts ? Regards JB⁣​ On Jan 17, 2017, 07:33, at 07:33, Stas Levin wrote: >Hi, > >At the moment only public methods are eligible to be decorated with >@ProcessElement (DoFnSig

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-17 Thread Jean-Baptiste Onofré
Hi I guess you mean discussion on the mailing list about that, right ? AFAIR the ide⁣​a is to provide a utility class to deal with pooling/batching. However not sure it's required as with @StartBundle etc in DoFn and batching depends of the end user "logic". Regards JB On Jan 17, 2017, 08:26,

Re: [DISCUSS] Python SDK status and next steps

2017-01-17 Thread Jean-Baptiste Onofré
Hi I didn't try the Python SDK recently but you provided a clear "state of the art". Anyway I'm in favor of merging things as quick as possible (assuming it's in a good shape in term of build, test, ...): it would potentially grow up the "external" contributions. So +1 from my side. Regards J

[BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-17 Thread Etienne Chauchot
Hi all, I have started to work on this ticket https://issues.apache.org/jira/browse/BEAM-135 As there where no vote since March 18th, is the issue still relevant/needed? Regards, Etienne

[DISCUSS] Python SDK status and next steps

2017-01-17 Thread Ahmet Altay
Hi all, tl;dr: I would like to start a discussion about merging python-sdk branch to master branch. Python SDK is mature enough and merging it to master will accelerate its development and adoption. With a great effort from a lot of contributors(*), Python SDK [1] is now a mostly complete, tested

@ProcessElement and private methods

2017-01-17 Thread Stas Levin
Hi, At the moment only public methods are eligible to be decorated with @ProcessElement (DoFnSignatures#findAnnotatedMethod). Seems that from a technical point of view, it would be quite possible to consider private methods as well. In fact, it's one of the benefits of moving towards an annotatio