Re: [ANNOUNCEMENT] New committers, May 2018 edition!

2018-06-01 Thread Thomas Groh
Congrats, you three! On Thu, May 31, 2018 at 7:09 PM Davor Bonaci wrote: > Please join me and the rest of Beam PMC in welcoming the following > contributors as our newest committers. They have significantly contributed > to the project in different ways, and we look forward to many more > contri

[VOTE] Code Review Process

2018-06-01 Thread Thomas Groh
As we seem to largely have consensus in "Reducing Committer Load for Code Reviews"[1], this is a vote to change the Beam policy on Code Reviews to require that (1) At least one committer is involved with the code review, as either a reviewer or as the author (2) A contributor has approved the chan

Reducing Committer Load for Code Reviews

2018-05-30 Thread Thomas Groh
Hey all; I've been thinking recently about the process we have for committing code, and our current process. I'd like to propose that we change our current process to require at least one committer is present for each code review, but remove the need to have a second committer review the code prio

Re: [VOTE] Go SDK

2018-05-23 Thread Thomas Groh
+1! I, for one, could not be more excited about our glorious portable future. On Mon, May 21, 2018 at 6:03 PM Henning Rohde wrote: > Hi everyone, > > Now that the remaining issues have been resolved as discussed, I'd like to > propose a formal vote on accepting the Go SDK into master. The main

Re: org.apache.beam.sdk.values.TupleTag#genId and stacktraces?

2018-04-10 Thread Thomas Groh
It may be reasonable to port most of those TupleTags to have an explicit, rather than generated ID, which will remove the need to inspect the stack trace. However, as mentioned, the constructor shouldn't provide an unstable ID, as otherwise most pipelines won't work on production runners. On Tue,

Re: org.apache.beam.sdk.values.TupleTag#genId and stacktraces?

2018-04-10 Thread Thomas Groh
In fact, this is explicitly to work with `static final` TupleTags, and using a non-stable isn't feasible. A static final TupleTag won't be serialized in the closure of an object that uses it - it will be instantiated independently in any other ClassLoader, such as on a remote JVM. If you use a con

Re: NoSuchElementException in reader.getCurrent*.

2018-03-13 Thread Thomas Groh
metawerx.net/> | Old Blog >> <http://rmannibucau.wordpress.com> | Github >> <https://github.com/rmannibucau> | LinkedIn >> <https://www.linkedin.com/in/rmannibucau> | Book >> <https://www.packtpub.com/application-development/java-ee-8-high-performance> >> >

Re: Flatten input data streams with skewed watermark progress

2018-03-12 Thread Thomas Groh
That one would be, for example, having a PCollection with a highly advanced watermark and a PCollection with a much earlier watermark, and have an input that is behind the watermark of the former PCollection go through the flatten - at which point it moves to being ahead of the watermark. That's f

Re: NoSuchElementException in reader.getCurrent*.

2018-03-12 Thread Thomas Groh
lt;https://rmannibucau.metawerx.net/> | Old Blog > <http://rmannibucau.wordpress.com> | Github > <https://github.com/rmannibucau> | LinkedIn > <https://www.linkedin.com/in/rmannibucau> | Book > <https://www.packtpub.com/application-development/java-ee-8-high-performance> &g

Re: NoSuchElementException in reader.getCurrent*.

2018-03-12 Thread Thomas Groh
If a call to `getCurrentWhatever` happens after `start` or `advance` has returned false, it's a bug in the runner, but the reader needs to be able to fail, otherwise you'll get a synthetic element that doesn't really exist. If a reader throws `NoSuchElementException` after the most recent call retu

Re: to a modular embedded java runner to replace the direct runner?

2018-03-05 Thread Thomas Groh
The portable java 'DirectRunner' is already in-progress, and has been for several months - it's tracked by https://issues.apache.org/jira/browse/BEAM-2899 My expectation is that the actual portability augmentations is unlikely to require significant changes to the DirectRunner implementations. I'd

Re: @TearDown guarantees

2018-02-16 Thread Thomas Groh
On perf: Deserialization of an arbitrary object is expensive. This cost is amortized over all of the elements that the object processes, but for a runner with small bundles, that cost never gets meaningfully amortized - deserializing a DoFn instance of unknown complexity to process one element mean

Re: @TearDown guarantees

2018-02-16 Thread Thomas Groh
I'll note as well that you don't need a well defined DoFn lifecycle method - you just want less granular bundling, which is a different requirement. Teardown has well-defined interactions with the rest of the DoFn methods, and what the runner is permitted to do when it calls Teardown - the fact th

Re: @TearDown guarantees

2018-02-16 Thread Thomas Groh
Given that I'm the original author of both the @Setup and @Teardown methods and the PR under discussion, I thought I'd drop in to give in a bit of history and my thoughts on the issue. Originally (Dataflow 1.x), the spec required a Runner to deserialize a new instance of a DoFn for every Bundle. F

Tracking Sickbayed tests in Jira

2018-01-31 Thread Thomas Groh
Hey everyone; I've realized that although we tend to tag any test we suppress (due to consistent flakiness) in the codebase, and file an associated JIRA issue with the failure, we don't have any centralized way to track tests that we're currently suppressing. To try and get more visibility into ou

Re: A personal update

2017-12-13 Thread Thomas Groh
It's good to see you around. Welcome back. On Wed, Dec 13, 2017 at 8:43 AM, Chamikara Jayalath wrote: > Welcome back :) > > - Cham > > On Wed, Dec 13, 2017 at 8:41 AM Jason Kuster > wrote: > >> Glad to have you back. :) >> >> On Wed, Dec 13, 2017 at 8:32 AM, Eugene Kirpichov >> wrote: >> >>> H

Re: Guarding against unsafe triggers at construction time

2017-12-04 Thread Thomas Groh
rything >>downstream only once, but that expectation appears impossible to satisfy >>safely. >> - *Make the continuation trigger of some triggers be the "invalid" >>trigger, *i.e. require the user to set it explicitly: there's in >>general no good and safe way to infer what a trigger on a second GBK >>"truly" should be, based on the trigger of the PCollection input into a >>first GBK. This is especially true for terminating triggers. >>- *Prohibit top-level terminating triggers entirely. *This will >>ensure that the only data that ever gets dropped is "droppably late" data. >> >> >> Do people think that these options are sensible? >> +Kenn Knowles +Thomas Groh +Ben >> Chambers is this a fair summary of our discussion? >> >> Thanks! >> > >

Re: [VOTE] Use Gradle for Apache Beam developmental processes

2017-11-28 Thread Thomas Groh
+1 On Tue, Nov 28, 2017 at 10:04 AM, Valentyn Tymofieiev wrote: > +1 I support the process change > > > On Tue, Nov 28, 2017 at 9:56 AM, Kenneth Knowles wrote: > >> +1 (binding) >> >> On Tue, Nov 28, 2017 at 9:55 AM, Lukasz Cwik wrote: >> >>> This is a procedural vote for migrating to use Grad

Re: [DISCUSS] Updating contribution guide for gitbox

2017-11-28 Thread Thomas Groh
I am strongly in favor of (1); I have no strong feelings about (2); I agree on (3), but generically am not hugely concerned, so long as back-references to the original PR are maintained, which is where most of the context lives. It is nice to have the change broken up into as many individually usef

Re: [DISCUSS] [Java] Private shaded dependency uber jars

2017-10-17 Thread Thomas Groh
+1 to the goal. I'm hugely in favor of not doing the same shading work every time for dependencies we know we'll use. This also means that if we end up pulling in transitive dependencies we don't want in any particular module we can avoid having to adjust our repackaging strategy for that module -

Re: [VOTE] Migrate to gitbox

2017-10-10 Thread Thomas Groh
+1 On Tue, Oct 10, 2017 at 9:12 AM, Kenneth Knowles wrote: > +1 > > On Tue, Oct 10, 2017 at 8:33 AM, Tyler Akidau > wrote: > > > +1 > > > > On Tue, Oct 10, 2017 at 2:13 AM Ismaël Mejía wrote: > > > > > +1 (non-binding) > > > > > > On Tue, Oct 10, 2017 at 10:42 AM, Aljoscha Krettek < > aljos...

Re: [DISCUSS] Switch to gitbox

2017-10-09 Thread Thomas Groh
+1. I do love myself a forcing function for passing tests. On Mon, Oct 9, 2017 at 7:51 AM, Aljoscha Krettek wrote: > +1 > > > On 6. Oct 2017, at 18:57, Kenneth Knowles > wrote: > > > > Sounds great. If I recall correctly, it means we could also us > assignment / > > review requests to pass pul

Re: Proposal: Unbreak Beam Python 2.1.0 with 2.1.1 bugfix release

2017-09-21 Thread Thomas Groh
+1 on cutting a release to fix this. As an aside, if we later determine that we require a release that includes Java, that release will be 2.1.2 (or equivalent) - the reason we aren't releasing Java artifacts is a matter of convenience (they have the same contents as the 2.1.0 release), not becaus

Re: Migration From 1.9.x to 2.1.0

2017-09-13 Thread Thomas Groh
for (1) and (4), the DoFn methods have been moved to be reflection based. Instead of using `@Override` in your DoFns, you should annotate those methods with `@StartBundle`, `@ProcessElement`, and `@FinishBundle` instead. For (2), Aggregators have been removed. Our suggested replacement is the use

Re: Policy for stale PRs

2017-08-16 Thread Thomas Groh
JIRAs should only be closed if the issue that they track is no longer relevant (either via being fixed or being determined to not be a problem). If a JIRA isn't being meaningfully worked on, it should be unassigned (in all cases, not just if there's an associated pull request that has not been work

Re: Adding back PipelineRunner#apply method

2017-08-15 Thread Thomas Groh
This style of method doesn't fit with the current approach of pipeline construction, where the PipelineRunner need not be specified until the pipeline is run; as such, the runner can't observe the construction of the Pipeline, as it may not exist during the construction of the Pipeline. On Tue, Au

Re: [VOTE] Release 2.1.0, release candidate #3

2017-08-11 Thread Thomas Groh
+1 Verified: * java examples, examples-java8 generation with the archetype plugin * WordCount on the Java DirectRunner * WordCount on the Java DataflowRunner * Complete Game Examples on the Java DataflowRunner * Streaming Game Examples on the Java DirectRunner On Fri, Aug 11, 2017 at 10:21 AM, M

Re: [ANNOUNCEMENT] New PMC members, August 2017 edition!

2017-08-11 Thread Thomas Groh
Congratulations to both of you! Looking forwards to both of your continued contributions. On Fri, Aug 11, 2017 at 10:40 AM, Davor Bonaci wrote: > Please join me and the rest of Beam PMC in welcoming the following > committers as our newest PMC members. They have significantly contributed > to th

Re: [PROPOSAL] "Requires deterministic input"

2017-08-10 Thread Thomas Groh
t; to > > > > > partially apply before a GroupByKey. > > > > > > > > > > On Thu, Aug 10, 2017 at 9:01 AM Tyler Akidau > > > > > > > > > > > > wrote: > > > > > > > > > > > +1 to the annota

Re: [PROPOSAL] "Requires deterministic input"

2017-08-09 Thread Thomas Groh
Aug 9, 2017 at 1:49 PM, Kenneth Knowles wrote: > On Wed, Aug 9, 2017 at 1:30 PM, Thomas Groh > wrote: > > > I have a minor concern that this may not work as expected for users that > > try to batch remote calls in `FinishBundle` - we should make sure we > > document

Re: [PROPOSAL] "Requires deterministic input"

2017-08-09 Thread Thomas Groh
+1 to the annotation-on-ProcessElement approach. ProcessElement is the minimum implementation requirement of a DoFn, and should be where the processing logic which depends on characteristics of the inputs lie. It's a good way of signalling the requirements of the Fn, and letting the runner decide.

Re: Style of messages for checkArgument/checkNotNull in IOs

2017-07-28 Thread Thomas Groh
I'm in favor of the wording in the style of the first: it's an immediately actionable message that will have an associated stack trace, but will provide the parameter in plaintext so the caller doesn't have to dig through the invoked code, they can just look at the documentation. I've recently bee

Re: Requiring PTransform to set a coder on its resulting collections

2017-07-27 Thread Thomas Groh
pect, especially since coding/decoding is often a dominant cost for such > pipelines. > > On Thu, Jul 27, 2017 at 11:00 AM, Thomas Groh > wrote: > > > +1 on getting rid of setCoder; just from a Java SDK perspective, my ideal > > world contains PCollections which don'

Re: Requiring PTransform to set a coder on its resulting collections

2017-07-27 Thread Thomas Groh
+1 on getting rid of setCoder; just from a Java SDK perspective, my ideal world contains PCollections which don't have a user-visible way to mutate them. My preference would be to use TypeDescriptors everywhere within Pipeline construction (where possible), and utilize the CoderRegistry everywhere

Re: Passing pipeline options into PTransforms and Filesystems in Python

2017-07-11 Thread Thomas Groh
We'd like to avoid giving PTransforms access to the pipeline options during pipeline construction. There are a few compelling reasons for doing so. The biggest one is that the context in which the pipeline is constructed and the context in which it executes may not be the same. As an example, if I

Mixed-Language Pipelines

2017-07-10 Thread Thomas Groh
Hey everyone; I've been working on a design for implementing multi-language pipelines within the Beam SDKs (also known as mix-and-match). This kind of pipeline lets us reuse transforms written in one language in any other language that supports the Runner API and the Fn API. Letting us write a tra

Re: [Proposal] Submitting pipelines to Runners in another language

2017-07-07 Thread Thomas Groh
I left a couple of comments. I'm looking forwards to this - it's going to be a good step towards being able to execute any pipeline on any runner. On Thu, Jul 6, 2017 at 5:11 PM, Ahmet Altay wrote: > Thank you Sourabh. I added my comments as well and +1 to Kenn. > > On Thu, Jul 6, 2017 at 2:21

Re: MergeBot is here!

2017-07-07 Thread Thomas Groh
Super duper cool. Very exciting. On Fri, Jul 7, 2017 at 1:40 PM, Ted Yu wrote: > For https://gitbox.apache.org/setup/ , after completing the first two > steps, is there any action needed for "MFA Status" box ? > > Cheers > > On Fri, Jul 7, 2017 at 1:37 PM, Lukasz Cwik > wrote: > > > for i in ra

Re: How can I disable running Python SDK tests when testing my Java change?

2017-05-18 Thread Thomas Groh
Generally I pass "-am -amd -pl sdks/java/core" to my maven invocation. -pl is the module to build, -am indicates to also make all modules my target depends upon, and -amd indicates to also make all of the dependencies; so if you're only modifying java, that should hit everything. If you're making a

Re: First stable release completed!

2017-05-17 Thread Thomas Groh
Well done all! On Wed, May 17, 2017 at 9:31 AM, Sourabh Bajaj < sourabhba...@google.com.invalid> wrote: > Congrats !! > On Wed, May 17, 2017 at 9:02 AM Mingmin Xu wrote: > > > Congratulations to everyone! > > > > On Wed, May 17, 2017 at 8:36 AM, Dan Halperin > > > > > wrote: > > > > > Great job

Re: [VOTE] First stable release: release candidate #4

2017-05-15 Thread Thomas Groh
+1 Since the last candidate, I've also run the game examples for a few hours on the DirectRunner and all's well. On Mon, May 15, 2017 at 9:16 AM, Lukasz Cwik wrote: > +1 (binding) > > Pei, I filed https://issues.apache.org/jira/browse/BEAM-2283 about using a > consistent strategy when dealing w

Re: First stable release: Acceptance criteria

2017-05-11 Thread Thomas Groh
I'm making sure the direct runner plays nice in a variety of scenarios (primarily the game examples, at the moment. Been a couple of hours and still going strong in streaming) On Thu, May 11, 2017 at 3:09 PM, Dan Halperin wrote: > I'm focusing on: > > * user reported bugs (Avro, TextIO, MongoDb)

Re: Direct runner doesn't seem to finalize checkpoint "quickly"

2017-05-10 Thread Thomas Groh
I'm going to start with number two, because it's got an easy answer: When performing an unbounded read, the DirectRunner will finalize a checkpoint after it has completed a subsequent read from that split where at least one element was read. A bounded read from an unbounded source will never be fin

Re: Pipeline termination in the unified Beam model

2017-05-10 Thread Thomas Groh
I think that generally this is actually less of a big deal than suggested, for a pretty simple reason: All bounded pipelines are expected to terminate when they are complete. Almost all unbounded pipelines are expected to run until explicitly shut down. As a result, shutting down an unbounded pip

Re: Process for getting the first stable release out

2017-05-05 Thread Thomas Groh
I'm also +1 on the branch. It'll help us make sure that what we're getting in is what we need for the FSR. On Fri, May 5, 2017 at 12:41 PM, Dan Halperin wrote: > I am +1 on cutting the branch, and the sentiment that we expect the first > pancake >

Re: How to control watermark when using BoundedSource

2017-05-04 Thread Thomas Groh
confused about the "when available" behavior of the runner. > Since the watermarks emitted by the BoundedSource will always be > BoundedWindow.TIMESTAMP_MIN_VALUE except for the last watermark, how could > the runner know when to trigger the computation on a window? > > T

Re: Future processing time timers and final watermark

2017-05-04 Thread Thomas Groh
Generally you shouldn't need to hold the watermark. The fact that the input watermark of the DoFn has advanced to the final watermark (i.e. positive infinity) means that all of the windows expire. At this point, any window with buffered data that is not finished should have its remaining elements o

Re: Congratulations Davor!

2017-05-04 Thread Thomas Groh
Congratulations! On Thu, May 4, 2017 at 7:56 AM, Thomas Weise wrote: > Congrats! > > > On Thu, May 4, 2017 at 7:53 AM, Sourabh Bajaj < > sourabhba...@google.com.invalid> wrote: > > > Congrats!! > > On Thu, May 4, 2017 at 7:48 AM Mingmin Xu wrote: > > > > > Congratulations @Davor! > > > > > > >

Re: Status of our CI tools

2017-04-28 Thread Thomas Groh
+1! This will be really helpful when looking at my PRs; I basically get no signal from the current state of the github UI, and this will restore that to giving me very strong positive signal. On Fri, Apr 28, 2017 at 6:22 PM, Davor Bonaci wrote: > Early on in the project, we've discussed our CI n

Re: How to control watermark when using BoundedSource

2017-04-28 Thread Thomas Groh
You can't directly control the watermark that a BoundedSource emits. Windowing into FixedWindows will still work as you expect, however: your elements will be assigned to their windows based on the time the event occurred. Depending on the runner, triggers may be run either "when available" or afte

Re: Can application specify how watermarks should be generated?

2017-04-25 Thread Thomas Groh
getCurrentTimestamp returns the timestamp of the current element. Both Bounded and Unbounded Readers have this method. For a bounded source, this is safe - the source watermark can be held to negative infinity while elements remain in the source and advance to infinity after all elements are read,

Re: [PROPOSAL] Remove KeyedCombineFn

2017-04-21 Thread Thomas Groh
A happy +1. This simplifies the code base, and if we find a compelling use, it shouldn't be too bad to add it back in. On Fri, Apr 21, 2017 at 10:24 AM, Kenneth Knowles wrote: > Hi all, > > I propose that we remove KeyedCombineFn before the first stable release. > > I don't think it adds enough

Re: Renaming SideOutput

2017-04-13 Thread Thomas Groh
k > wrote: > > > > I agree, I'll create a PR with the doc changes (the rename + text changes > > to make things more clear). I know of at least 2 places we refer to side > > outputs (programming guide and the "Design your pipeline" page). > > > >

Re: Renaming SideOutput

2017-04-12 Thread Thomas Groh
Cool! I've filed https://issues.apache.org/jira/browse/BEAM-1949 and authored https://github.com/apache/beam/pull/2512 to make this change. On Tue, Apr 11, 2017 at 11:33 PM, Ted Yu wrote: > +1 > > > On Apr 11, 2017, at 5:34 PM, Thomas Groh > wrote: > > > > I t

Re: Renaming SideOutput

2017-04-11 Thread Thomas Groh
gt; > > >> +1, I think this is a lot clearer. > >> > >> On Tue, Apr 11, 2017 at 2:24 PM, Stephen Sisk > >> wrote: > >> > strong +1 for changing the name away from sideOutput - the fact that > >> > sideInput and sideOutput are not re

Renaming SideOutput

2017-04-11 Thread Thomas Groh
Hey everyone: I'd like to rename DoFn.Context#sideOutput to #output (in the Java SDK). Having two methods, both named output, one which takes the "main output type" and one that takes a tag to specify the type more clearly communicates the actual behavior - sideOutput isn't a "special" way to out

Re: Combine.Global

2017-04-10 Thread Thomas Groh
This looks like it might be because the output coder cannot be determined. It looks like the registry understands that it must build a KvCoder, but cannot infer the coder for OutputT. More specifically, within the stack trace, the following line occurs: "Unable to provide a default Coder for java.

Re: Behaviour of watermarks in the presence of WithTimestamps

2017-04-06 Thread Thomas Groh
Hey Matt; Generally this is an unsolved problem. We track a related issue in https://issues.apache.org/jira/browse/BEAM-644, which permits the watermark to be shifted backwards in time. That would let a source that does not support timestamps to emit elements timestamped with "when I read the elem

Re: IO IT Patterns: Simplifying data loading

2017-04-03 Thread Thomas Groh
+1! I really like this approach; it lets us test for consistency without having to reimplement parts of the IO to actually load our data. I'd also like to note a few things which I think will be required when we want to expand this framework and style to also handle Unbounded IOs. Obviously, unbo

Re: Spec cleanup for Finalize Checkpoint

2017-03-29 Thread Thomas Groh
(Short URL: https://s.apache.org/FIWQ) On Wed, Mar 29, 2017 at 1:15 PM, Thomas Groh wrote: > Hey everyone, > > We've had a few bugs recently in the DirectRunner based around finalizing > checkpoints, as well as a bit of confusion on what should be permitted from > within

Spec cleanup for Finalize Checkpoint

2017-03-29 Thread Thomas Groh
Hey everyone, We've had a few bugs recently in the DirectRunner based around finalizing checkpoints, as well as a bit of confusion on what should be permitted from within a checkpoint. Those caused some revisiting of the checkpoint spec, both to make sure we have written down what a runner is mean

Re: [PROPOSAL] @OnWindowExpiration

2017-03-29 Thread Thomas Groh
+1 The fact that we have this ability already (including all of the required information), just in a roundabout way by manually dredging in the allowed lateness, means that this isn't a huge burden to implement on an SDK or runner side; meanwhile, this much more strongly communicates what a user i

Side-Channel Inputs and an SDK

2017-03-24 Thread Thomas Groh
Hey everyone; I have a quick one-pager on PTransform capabilities and the ability for a PTransform to receive inputs via a Side-channel. This is a low-impact change to any existing user and runner author, but since it hits every PTransform and is required for the runner API, I thought you all migh

Re: [ANNOUNCEMENT] New committers, March 2017 edition!

2017-03-17 Thread Thomas Groh
Well done, congratulations, and welcome, everyone! On Fri, Mar 17, 2017 at 2:13 PM, Davor Bonaci wrote: > Please join me and the rest of Beam PMC in welcoming the following > contributors as our newest committers. They have significantly contributed > to the project in different ways, and we loo

Re: Release 0.6.0

2017-03-03 Thread Thomas Groh
Hey everyone; The submission of Surgery for the Dataflow Runner in the Java SDK has broken all streaming jobs that use Side Inputs in that runner. I'm working on a fix, ETA later today. I'd like to block the release on that. Sorry for the late notification. https://issues.apache.org/jira/browse/

Re: Pipeline termination in the unified Beam model

2017-03-01 Thread Thomas Groh
+1 I think it's a fair claim that a PCollection is "done" when it's watermark reaches positive infinity, and then it's easy to claim that a Pipeline is "done" when all of its PCollections are done. Completion is an especially reasonable claim if we consider positive infinity to be an actual infini

Re: [DISCUSS] Per-Key Watermark Maintenance

2017-02-28 Thread Thomas Groh
I think that a per-key watermark is not just consistent with the model, but also there's an argument to be made that it is the correct way to conceive of watermarks in Beam. The way we currently hold watermarks inside of ReduceFnRunner is via a WatermarkHold, which is set per-key. As a result, the

Re: Pipeline Surgery and an interception-free future

2017-02-17 Thread Thomas Groh
rces translating bounded reads by the unbounded > > translator and it feels awkward, this makes it right again. > > > > Thanks Thomas! > > > > On Thu, Feb 16, 2017 at 4:12 PM Aljoscha Krettek > > wrote: > > > > > I might just try and do that. ;-) >

Re: We've hit 2000 PRs!

2017-02-16 Thread Thomas Groh
Impressive work everyone. Very cool. On Thu, Feb 16, 2017 at 8:05 AM, Dan Halperin wrote: > Checking my previous claims: > > PR #1: Feb 26, 2016 > PR #1000: Sep 24, 2016 (211 days later) > PR #2000: Feb 13, 2016 (142 days later) Yep -- much quicker! > > I'm excited to see this community growing

Pipeline Surgery and an interception-free future

2017-02-15 Thread Thomas Groh
As of Github PR #1998 (https://github.com/apache/beam/pull/1998), the new Pipeline Surgery API is ready and available. There are a couple of refinements coming in PR #2006, but in general pipelines can now, post construction, have PTransforms swapped out to whatever the runner desires (standard "be

Re: [ANNOUNCEMENT] New committers, January 2017 edition!

2017-01-27 Thread Thomas Groh
Congratulations all! On Fri, Jan 27, 2017 at 9:34 AM, Chamikara Jayalath wrote: > Congrats all !! :) > > - Cham > > On Fri, Jan 27, 2017 at 4:13 AM Stas Levin wrote: > > > Thanks all, glad to be joining! > > > > On Fri, Jan 27, 2017, 13:07 Aljoscha Krettek > wrote: > > > > > Welcome aboard! :-

Re: Default Timestamp and Watermark

2017-01-26 Thread Thomas Groh
The default timestamp should be BoundedWindow.TIMESTAMP_MIN_VALUE, which is equivalent to -2**63 microseconds. We also occasionally refer to this timestamp as "negative infinity". The default watermark policy for a bounded source should be negative infinity until all of the data is read, then posi

Re: Conceptually, what are bundles?

2017-01-25 Thread Thomas Groh
I have a couple of points in addition to what Robert said Runners are permitted to determine bundle sizes as appropriate to their implementation, so long as bundles are atomically committed. The contents of a PCollection are independent of the bundling of that PCollection. Runners can process all

Composite Types and the Runner API

2017-01-17 Thread Thomas Groh
Hey everyone; I've been working on parts of the runner API recently, and part of that has included a shift of how composite inputs and outputs must be represented by the time a PipelineRunner begins to access them. I have a PR that completes this work within the Java SDK, but wanted to ensure that

Re: Graduation!

2017-01-11 Thread Thomas Groh
This is sweet. Congratulations to everyone involved in making this happen, and I'm excited to see where we go from here. On Wed, Jan 11, 2017 at 5:09 AM, amarouni wrote: > Congratulations to everyone on this important milestone. > > > On 11/01/2017 11:52, Neelesh Salian wrote: > > Congratulation