Re: [PROPOSAL] Merge gearpump-runner to master

2017-08-10 Thread Manu Zhang
Hi Paul, The latest master compiles fine for me. Could you check again ? You may also want to check out the contribution guide . In short, the Apache way is to file a JIRA issue and submit a

Re: Requiring PTransform to set a coder on its resulting collections

2017-08-10 Thread Reuven Lax
Interestingly I've seen examples of PTransforms where the transform itself is unable to easily set its own coder. This happens when the transform is parametrized in such a way that its ouput coder is not determinable except by the caller of the PTransform. The caller can of course pass a coder

Re: Style of messages for checkArgument/checkNotNull in IOs

2017-08-10 Thread Eugene Kirpichov
https://beam.apache.org/contribute/ptransform-style-guide/#validation now includes the new guidance. It also includes updated guidance on what to put in expand() vs. validate() (TL;DR: validate() is almost always unnecessary. Put almost all validation in expand()) On Fri, Jul 28, 2017 at 11:56

Re: Requiring PTransform to set a coder on its resulting collections

2017-08-10 Thread Eugene Kirpichov
I've updated the guidance in PTransform Style Guide on setting coders https://beam.apache.org/contribute/ptransform-style-guide/#coders according to this discussion. https://github.com/apache/beam-site/pull/279 On Thu, Aug 3, 2017 at 6:27 PM Robert Bradshaw wrote: >

Re: beam-site issues with Jenkins and MergeBot

2017-08-10 Thread Jason Kuster
Investigating mergebot outage currently. Apologies for the downtime. On Wed, Aug 9, 2017 at 9:55 PM, Eugene Kirpichov wrote: > Indeed beam-site is at https://gitbox.apache.org/repos/asf/beam-site.git > now. >

Re: [PROPOSAL] "Requires deterministic input"

2017-08-10 Thread Reuven Lax
On Thu, Aug 10, 2017 at 11:18 AM, Thomas Groh wrote: > I think it must imply fixed content s - making a decision based > on the contents of an iterable assuming the Iterable is deterministic seems > an acceptable use of the API, and that requires the contents

Re: [PROPOSAL] "Requires deterministic input"

2017-08-10 Thread Thomas Groh
I think it must imply fixed content s - making a decision based on the contents of an iterable assuming the Iterable is deterministic seems an acceptable use of the API, and that requires the contents to be identical through failures. This does imply that (assuming this is reading

Re: [PROPOSAL] "Requires deterministic input"

2017-08-10 Thread Reuven Lax
It means that single element replay is stable. On Thu, Aug 10, 2017 at 10:56 AM, Raghu Angadi wrote: > Can we define what exactly is meant by deterministic/stable/replayable > etc? > >- Does it imply a fixed order? If yes, it implies fixed order of >

Re: Exactly-once Kafka sink

2017-08-10 Thread Raghu Angadi
On Thu, Aug 10, 2017 at 5:15 AM, Aljoscha Krettek wrote: > Ah, also regarding your earlier mail: I didn't know if many people were > using Kafka with Dataflow, thanks for that clarification! :-) > > Also, I don't think that the TwoPhaseCommit Sink of Flink would work in a >

Re: [PROPOSAL] "Requires deterministic input"

2017-08-10 Thread Scott Wegner
Does requires-stable-input only apply to ParDo transforms? I don't think it would make sense to annotate to composite, because checkpointing should happen as close to the side-effecting operation as possible, since upstream transforms within a composite could introduce non-determinism. So it's

Re: [PROPOSAL] "Requires deterministic input"

2017-08-10 Thread Tyler Akidau
+1 to the annotation idea, and to having it on processTimer. -Tyler On Thu, Aug 10, 2017 at 2:15 AM Aljoscha Krettek wrote: > +1 to the annotation approach. I outlined how implementing this would work > in the Flink runner in the Thread about the exactly-once Kafka Sink. >

Re: streaming output in just one files

2017-08-10 Thread Reuven Lax
On Thu, Aug 10, 2017 at 8:29 AM, Reuven Lax wrote: > This is how the file sink has always worked in Beam. If no sharding is > specified, then this means runner-determined sharding, and by default that > is one file per bundle. If Flink has small bundles, then I suggest using >

Re: Exactly-once Kafka sink

2017-08-10 Thread Aljoscha Krettek
Ah, also regarding your earlier mail: I didn't know if many people were using Kafka with Dataflow, thanks for that clarification! :-) Also, I don't think that the TwoPhaseCommit Sink of Flink would work in a Beam context, I was just posting that for reference. Best, Aljoscha > On 10. Aug 2017,

Re: [PROPOSAL] "Requires deterministic input"

2017-08-10 Thread Aljoscha Krettek
+1 to the annotation approach. I outlined how implementing this would work in the Flink runner in the Thread about the exactly-once Kafka Sink. > On 9. Aug 2017, at 23:03, Reuven Lax wrote: > > Yes - I don't think we should try and make any deterministic guarantees >

Re: Exactly-once Kafka sink

2017-08-10 Thread Aljoscha Krettek
@Raghu: Yes, exactly, that's what I thought about this morning, actually. These are the methods of an operator that are relevant to checkpointing: class FlinkOperator() { open(); snapshotState(): notifySnapshotComplete(); initializeState(); } Input would be buffered in state, would be

Jenkins build is back to normal : beam_Release_NightlySnapshot #499

2017-08-10 Thread Apache Jenkins Server
See