Re: Custom window merging

2017-07-26 Thread Robert Bradshaw
On Wed, Jul 26, 2017 at 9:43 PM, Kenneth Knowles wrote: > This is a bit of an improvised change to the Beam model, if these are > really treated *that* specially. (notably, they are a subset of the > WindowFns that we ship with our SDKs, so it really is a careful selection) > > It does make sense

Re: Custom window merging

2017-07-26 Thread Kenneth Knowles
This is a bit of an improvised change to the Beam model, if these are really treated *that* specially. (notably, they are a subset of the WindowFns that we ship with our SDKs, so it really is a careful selection) It does make sense to have some special WindowFns with distinguished semantics, since

Re: Custom window merging

2017-07-26 Thread Robert Bradshaw
I think there may be a distinction between hard-coding support for the "standard" WindowFns (e.g. https://github.com/apache/beam/blob/master/sdks/common/runner-api/src/main/proto/standard_window_fns.proto) and accepting WindowFns as a UDF. Different runners have offered different levels of support

Re: Custom window merging

2017-07-26 Thread Kenneth Knowles
Hi Etienne, Every WindowFn is a UDF, so there is really no such thing as "custom" window merging. Is this the same as saying that a runner supports only merging for Sessions? Or just supports WindowFn that merges based on overlap? Kenn On Mon, Jul 24, 2017 at 10:15 AM, Etienne Chauchot wrote:

Re: Requiring PTransform to set a coder on its resulting collections

2017-07-26 Thread Kenneth Knowles
On Wed, Jul 26, 2017 at 8:58 PM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: > Hmm, yes, I just noticed that PCollection has a setTypeDescriptor() method, > and I wonder how much will break if all call sites of setCoder() will call > setTypeDescriptor() instead - i.e. how far are we fr

Re: Requiring PTransform to set a coder on its resulting collections

2017-07-26 Thread Eugene Kirpichov
Hmm, yes, I just noticed that PCollection has a setTypeDescriptor() method, and I wonder how much will break if all call sites of setCoder() will call setTypeDescriptor() instead - i.e. how far are we from the ideal state of having a coder inferrable for every sufficiently concrete type descriptor.

Re: Requiring PTransform to set a coder on its resulting collections

2017-07-26 Thread Kenneth Knowles
+1 but maybe go ever further On Tue, Jul 25, 2017 at 8:25 PM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: > Hello, > > I've worked on a few different things recently and ran repeatedly into the > same issue: that we do not have clear guidance on who should set the Coder > on a PCollec

Re: Requiring PTransform to set a coder on its resulting collections

2017-07-26 Thread Eugene Kirpichov
Okay, first PR is in review https://github.com/apache/beam/pull/3649 On Wed, Jul 26, 2017 at 11:58 AM Robert Bradshaw wrote: > +1, I'm a huge fan of moving this direction. Right now there's also > the ugliness that setCoder() may be called any number of times before > a PCollection is used (the

Re: Should Pipeline wait till all processing time timers fire before exit?

2017-07-26 Thread Robert Bradshaw
On Wed, Jul 26, 2017 at 7:45 AM, Lukasz Cwik wrote: > Robert, in your case where output is being produced based upon a heartbeat, > either the watermark on the output went to infinity and all that data being > produced is droppable at which point the timer becomes droppable But why are these time

Re: Requiring PTransform to set a coder on its resulting collections

2017-07-26 Thread Robert Bradshaw
+1, I'm a huge fan of moving this direction. Right now there's also the ugliness that setCoder() may be called any number of times before a PCollection is used (the last setter winning) but is an error to call it once it has been used (and here "used" is not clear--if a PCollection is returned from

Re: Requiring PTransform to set a coder on its resulting collections

2017-07-26 Thread Mingmin Xu
Second that 'it's responsibility of the transform'. For the case when a PTransform doesn't have enough information(PTransform developer should have the knowledge), I would prefer a strict way so users won't forget to call withSomethingCoder(), like - a Coder is required to new the PTransform; - or

Re: Requiring PTransform to set a coder on its resulting collections

2017-07-26 Thread Eugene Kirpichov
Hm, can you elaborate? I'm not sure how this relates to my suggestion, the gist of which is "PTransform's should set the coder on all of their outputs, and the user should never have to .setCoder() on a PCollection obtained from a PTransform" On Wed, Jul 26, 2017 at 7:38 AM Lukasz Cwik wrote: >

Re: Dynamic file-based sinks

2017-07-26 Thread Reuven Lax
Yes, there was! TextIO support is already merged into Beam (it missed the 2.1 cutoff, so it will be in Beam 2.2.0). AvroIO support is in https://github.com/apache/beam/pull/3541. This is almost ready to merge - still waiting for final review from kennknowles on the Beam translation changes. Nobody

Re: Should Pipeline wait till all processing time timers fire before exit?

2017-07-26 Thread Lukasz Cwik
Robert, in your case where output is being produced based upon a heartbeat, either the watermark on the output went to infinity and all that data being produced is droppable at which point the timer becomes droppable or the output watermark is being held by the scheduling of the next timer and henc

Re: Dynamic file-based sinks

2017-07-26 Thread Josh
Hi all, Was there any progress on this recently? I am particularly interested in using value-dependent destinations in BigtableIO (writing to a specific table depending on the value) and AvroIO (writing to specific GCS buckets depending on the value). Thanks, Josh On Fri, Jun 9, 2017 at 5:35 PM,

Re: Requiring PTransform to set a coder on its resulting collections

2017-07-26 Thread Lukasz Cwik
I'm split between our current one pass model of pipeline construction and a two pass model where all information is gathered and then PTransform expansions are performed. On Tue, Jul 25, 2017 at 8:25 PM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: > Hello, > > I've worked on a few di