Re: Event-time based in-window trigger

2016-12-08 Thread Kenneth Knowles
question while we're at it: what if you have events >> happening every second within the window? Do you really want to emit a new >> pane every second as the watermark progresses (assuming it progresses >> relatively smoothly)? What if we're talking differences of event times of >

Re: [DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

2016-12-08 Thread Kenneth Knowles
> > > > a construction time exception since it would require adding code to > > every > > > > PTransform implementation. > > > > > > > > On Wed, Dec 7, 2016 at 1:55 PM Thomas Groh <tg...@google.com.invalid > > > > > > wrote: > >

Re: Event-time based in-window trigger

2016-12-01 Thread Kenneth Knowles
Thanks for laying out some details. On Thu, Dec 1, 2016 at 7:09 PM, Manu Zhang wrote: > > Yes, the difficulty is to define that trigger. The existing triggers fire > at the end of window. (I could be mistaken, which will be good news) > You are not mistaken that the

Re: GroupByKey and CombineFn: internals

2016-11-22 Thread Kenneth Knowles
One minor correction I wanted to note: when you use Combine.perKey(SerializableFunction) the SerializableFunction plays roughly the same role as mergeAccumulators (and the Combine is implemented just so) , so it requires that the input type V is the accumulator type, as with

Re: Why does `Combine.perKey(SerializableFunction)` require same input and output type

2016-10-31 Thread Kenneth Knowles
Manu, I think your critique about user interface clarity is valid. CombineFn conflates a few operations and is not that clear about what it is doing or why. You seem to be concerned about CombineFn versus SerializableFunction constructors for the Combine family of transforms. I thought I'd respond

Re: Flink Twitter Example

2016-10-14 Thread Kenneth Knowles
Hi Trevor, The problem is that "Write" is an old name that should be changed to "BoundedWrite" (actually it is even more specific). In fact, it re-windows into the global window and removes all triggering, so it is suitable only for bounded PCollections where this will ensure all the data arrives

Re: changing the allowed skew

2016-08-08 Thread Kenneth Knowles
Hi Amir, I believe you should use KafkaIO#withTimestampFn [1]. For unbounded PCollections, the source itself needs to know about the timestamps so it can maintain a good watermark. The example you are editing uses a bounded input, which has different implications for the watermark. The text in

Re: Is Beam pipeline runtime behavior inconsistent?

2016-08-08 Thread Kenneth Knowles
Hi Amir, How are you assembling your classpath? The library that contains that class should be automatically included. Kenn On Mon, Aug 8, 2016 at 5:30 PM, amir bahmanyari wrote: > Yes :) I figured it out and it compiled. > now ClassNotFound ar runtime:

[ANNOUNCE] A brand new DoFn @ HEAD (action required)

2016-08-03 Thread Kenneth Knowles
Hi all, We have just incorporated a major feature into the master branch of the Beam codebase: a new DoFn. Assuming you are tracking HEAD, this will require some (easy) adjustment to your pipelines. The experimental feature that was in the codebase as DoFnWithContext is now just "DoFn" for Beam.

[ANNOUNCE] Beam-related talks this week, in and around Strata+Hadoop World

2016-05-30 Thread Kenneth Knowles
by Slava Chernyak. ( http://conferences.oreilly.com/strata/hadoop-big-data-eu/public/schedule/detail/49605 ) - "Stream Processing AMA with Apache Beam/Google Cloud Dataflow engineers" with Tyler Akidau, Slava Chernyak, and Kenneth Knowles. ( http://conferences.oreilly.com/strat

Re: Writing Out List

2016-05-24 Thread Kenneth Knowles
gt; PCollection<List> with a single element containing the smallest count >> elements >> of the inputPCollection, in increasing order, sorted according to >> their natural order. >> >> It also says: >> >> All the elements of the result's List must f

Re: Capability matrix question

2016-03-23 Thread Kenneth Knowles
+1 to considering "metric" / PMetric / etc. On Wed, Mar 23, 2016 at 8:09 AM, Amit Sela wrote: > How about "PMetric" ? > > On Wed, Mar 23, 2016, 16:53 Frances Perry wrote: > >> Perhaps I'm unclear on what an “Aggregator” is. I assumed that a line