Re: How to serialize/deserialize a Pipeline object?

2016-12-21 Thread Kenneth Knowles
> > Choosing an intermediate representation that can be serialized and > sent to a cloud service (where it is then translated into the actual > implementation representation) is a fine solution. In fact that's what > Dataflow itself does. > > Of course we'll want to

Re: How to serialize/deserialize a Pipeline object?

2016-12-21 Thread Kenneth Knowles
Hi Shen, I want to tell you (1) how things work today and (2) how we want them to be eventually. (1) So far, each runner translates the Pipeline to their own graph format before serialization, so we have not yet encountered this issue. (2) We intend to make a standard mostly-readable JSON format

Jenkins seed job breakage

2016-12-19 Thread Kenneth Knowles
Hi all, The massive Jenkins breakage just now was me updating the seed job in unfriendly ways. It should be all cleared up now. Apologies for that. I'll be trying to come up with safer ways to validate such changes in the future. Kenn

Re: Build failed in Jenkins: beam_SeedJob_Main #43

2016-12-19 Thread Kenneth Knowles
Context: PR #1640 has its LGTM. Before committing it, I am ensuring it works by running the seed job against it. This _will_ change the other jobs if/when it succeeds, but it will change them to what they are about to be anyhow. The failure here is not substantive. I built against origin/pr/1640 i

Re: Jenkins build became unstable: beam_Release_NightlySnapshot #269

2016-12-19 Thread Kenneth Knowles
This was an error in the Dataflow integration tests that is not related to any changes to Beam. On Sun, Dec 18, 2016 at 11:31 PM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See NightlySnapshot/269/changes> > >

Re: [VOTE] Release 0.4.0-incubating, release candidate #3

2016-12-17 Thread Kenneth Knowles
+1, as long as it is fine for the release to be signed by a PMC member other than the release manager. Otherwise need to replace the .asc file. Following [Apache release checklist]( http://incubator.apache.org/guides/releasemanagement.html#check-list): 1.1 Verified checksums & signature (Davor's)

Re: Jenkins build became unstable: beam_PostCommit_Java_MavenInstall #2124

2016-12-16 Thread Kenneth Knowles
Prior to this coming in, it was already mostly rolled forwards in Dataflow. It would actually be counterproductive to revert in Beam as that would re-introduce the same bug in reverse. Filed BEAM-1172 to prevent in the future. On Fri, Dec 16, 2016 at 2:18 PM, Apache Jenkins Server < jenk...@build

Re: Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Apex #4

2016-12-15 Thread Kenneth Knowles
as it runs as > part of the unit tests and provides basic coverage early on. If you have > already ideas what is wrong with it please let me know. This test was seen > as flaky before. > > Thanks, > Thomas > > > On Thu, Dec 15, 2016 at 12:53 PM, Kenneth Knowles > wrote

Re: Jenkins build became unstable: beam_PostCommit_Java_RunnableOnService_Apex #4

2016-12-15 Thread Kenneth Knowles
This build job is new - I just added it - so it is expected that we'll have to shake some stuff out. Looking at the failure, it is in org.apache.beam.runners.apex.examples.WordCountTest I have ideas what is wrong - and I think it is just the test - but this is now redundant with how we run org.ap

Re: [VOTE] Release 0.4.0-incubating, release candidate #1

2016-12-15 Thread Kenneth Knowles
ld prefer to cut a RC2. > > Regards > JB⁣​ > > On Dec 15, 2016, 20:06, at 20:06, Kenneth Knowles > wrote: > >Agreed. I had though the issue in PR #1620 only affected Dataflow (in > >which > >case we could address it in the service) but it now also affects t

Re: [VOTE] Release 0.4.0-incubating, release candidate #1

2016-12-15 Thread Kenneth Knowles
Agreed. I had though the issue in PR #1620 only affected Dataflow (in which case we could address it in the service) but it now also affects the Flink runner, so it should be included in the release. On Thu, Dec 15, 2016 at 10:46 AM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: > There

Re: New testSideInputsWithMultipleWindows and should DoFnRunner explode if DoFn contains a side input ?

2016-12-14 Thread Kenneth Knowles
Yes, this is a bug in SimplerDoFnRunner (or maybe some clarity on whether or not it owns this) not the Spark runner. FWIW the test is definitely correct, and runners-core has had this bug for a while. It is https://issues.apache.org/jira/browse/BEAM-1149 and I'm on it. On Wed, Dec 14, 2016 at 11:0

Re: Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Spark #409

2016-12-14 Thread Kenneth Knowles
This is still https://issues.apache.org/jira/browse/BEAM-1149. We recently added a test for it. The actual behavior has been broken for everyone for a while. It is half-fixed by Eugene K. (some DoFnRunners) but not all. On Wed, Dec 14, 2016 at 10:51 AM, Apache Jenkins Server < jenk...@builds.apach

Re: Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #1806

2016-12-13 Thread Kenneth Knowles
Failure in https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_RunnableOnService_Dataflow/1806/ is caused by https://github.com/apache/incubator-beam/pull/1541, which I am reverting. On Tue, Dec 13, 2016 at 3:16 PM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See

Re: [PROPOSAL] "IOChannelFactory" Redesign and Make it Configurable

2016-12-13 Thread Kenneth Knowles
d suggestions are welcome. > > Thanks > -- > Pei > > --- > [1]: > https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id- > XJsVG3qel2lhdKTknmZ_7M/edit?disco=AAAAA30vtPU#heading=h.p3gc3colc2cs > > [2]: > https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id- > XJ

Re: Beam Tuple

2016-12-13 Thread Kenneth Knowles
If the scope is really just tuples, then supposing a user chooses to go with Apache Commons tuples or javatuples it seems that the problem to be solved is easily providing coders for common data types that are not part of Beam. I think we should address this anyhow. The scope of having a common fo

Re: Jenkins pre/postcommit increased from 35m to 60m+ on Friday

2016-12-12 Thread Kenneth Knowles
--- > > [INFO] BUILD SUCCESS > [INFO] > -------- > [INFO] Total time: 34:30 min > [INFO] Finished at: 2016-12-09T18:50:49+00:00 > [INFO] Final Memory: 196M/1051M > [INFO]

Re: examples-java8 tests running slow

2016-12-12 Thread Kenneth Knowles
Yes, they are a bit harder to get fine-tuned executions. But they should only be run in the integration-test phase, not with unit tests. Is this happening when you run them locally or in Jenkins? On Mon, Dec 12, 2016 at 5:06 PM, Manu Zhang wrote: > Sorry, they are tests under *maven-archetypes/e

Jenkins pre/postcommit increased from 35m to 60m+ on Friday

2016-12-12 Thread Kenneth Knowles
Hi all, We have a huge Jenkins backlog, surely exacerbated by the fact that our test time (precommit and postcommit mvn install) has roughly doubled in the last few days. Here's the quick link to the trend: https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/buildTimeTrend

Re: New DoFn and WindowedValue/WinowingInternals

2016-12-11 Thread Kenneth Knowles
You've got it right. My recommendations is to just directly implement it for the Spark runner. It will often actually clean things up a bit. Here's the analogous change for the Flink runner: https://github.com/apache/incubator-beam/pull/1435/files. With GABW, I tried going through the process of k

Re: [DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

2016-12-08 Thread Kenneth Knowles
be made, or that they were all "ready to go". > > Are there any? If so, we should block the next release. > > On Fri, Dec 9, 2016 at 1:58 AM, Kenneth Knowles > wrote: > > > Thanks all! This has been done. > > > > On Thu, Dec 8, 2016 at 3:37 AM, Amit S

Re: [DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

2016-12-08 Thread Kenneth Knowles
gt; +1 > > > > > > On Thu, Dec 8, 2016 at 1:10 PM Jean-Baptiste Onofré > > > wrote: > > > > > > > +1 > > > > > > > > Regards > > > > JB > > > > > > > > On 12/07/2016 10:37 PM, Kenneth Know

[DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

2016-12-07 Thread Kenneth Knowles
Hi all, I want to bring up another major backwards-incompatible change before it is too late, to resolve [BEAM-438]. Summary: Leave PInput.apply the same but rename PTransform.apply to PTransform.expand. I have opened [PR #1538] just for reference (it took 30 seconds using IDE automated refactor)

Re: [PROPOSAL] "IOChannelFactory" Redesign and Make it Configurable

2016-12-06 Thread Kenneth Knowles
Thanks for the thorough answers. It all sounds good to me. On Tue, Dec 6, 2016 at 12:57 PM, Pei He wrote: > Thanks Kenn for the feedback and questions. > > I responded inline. > > On Mon, Dec 5, 2016 at 7:49 PM, Kenneth Knowles > wrote: > > > I really like this docum

Re: [PROPOSAL] "IOChannelFactory" Redesign and Make it Configurable

2016-12-05 Thread Kenneth Knowles
I really like this document. It is easy to read and informative. Three things not addressed by the document: 1. Major Beam use cases. I'm sure we have a few in the SDK that could be outlined in terms of the new API with pseudocode. 2. Related work. How does this differ from other filesystem APIs a

Re: Jenkins build is unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #1730

2016-12-05 Thread Kenneth Knowles
The error message looks like a transient error, though it is easy to believe this change could cause a problem. I will keep a sharp eye on it. On Mon, Dec 5, 2016 at 4:21 PM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See Run

Re: [DISCUSS] ExecIO

2016-12-05 Thread Kenneth Knowles
racting with the shell. > > Requirements for the executor can be specified with an annotation on the > parameter or via an annotation within the DoFn. > > On Mon, Dec 5, 2016 at 1:15 PM Kenneth Knowles > wrote: > > > I would like the runner-independent, language-independ

Re: [DISCUSS] ExecIO

2016-12-05 Thread Kenneth Knowles
I would like the runner-independent, language-independent graph to have a way to specify requirements on the environment that a DoFn runs in. This would provide a natural way to talk about installed libraries, containers, external services that are accessed, etc, and I think the requirement of a pa

Re: PAssertTest#runExpectingAssertionFailure() and waitUntilFinish()

2016-12-05 Thread Kenneth Knowles
Hi Stas, This is something special to TestPipeline and the test configuration for a runner. If runExpectingAssertionFailure() does not succeed, then our whole suite of RunnableOnService tests is not going to work, because they all have an assumption that TestPipeline#run() waits until the asserti

Jenkins precommit worker affinity

2016-11-30 Thread Kenneth Knowles
It appears that the new job beam_PreCommit_Java_MavenInstall has an affinity for Jenkins worker beam3 while workers beam1 and beam2 sit idle. Is this intentional? There seems to be a backlog of half a dozen builds.

Re: Questions about coders

2016-11-30 Thread Kenneth Knowles
On Wed, Nov 30, 2016 at 3:52 PM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: > Hello, > > Do we have anywhere a set of recommendations for developing new coders? I'm > confused by a couple of things: > > - Why are coders serialized by JSON serialization instead of by regular > Java ser

Re: Jenkins build is still unstable: beam_PostCommit_MavenVerify #1948

2016-11-30 Thread Kenneth Knowles
This is a Dataflow-specific linking error. I am investigating and proceeding with a temporary rollback. On Wed, Nov 30, 2016 at 3:06 PM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See > >

Re: Jenkins build became unstable: beam_Release_NightlySnapshot #249

2016-11-30 Thread Kenneth Knowles
This looks like it might have been the sort of thing that #1189 (just merged) will fix. On Tue, Nov 29, 2016 at 11:29 PM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See NightlySnap

Re: Jenkins build became unstable: beam_PostCommit_RunnableOnService_GoogleCloudDataflow #1668

2016-11-28 Thread Kenneth Knowles
This was also due to premature commit of tests for stateful ParDo. Rolling forward, fix is https://github.com/apache/incubator-beam/pull/1411. On Mon, Nov 28, 2016 at 2:21 PM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See

Re: Build failed in Jenkins: beam_Release_NightlySnapshot #242

2016-11-23 Thread Kenneth Knowles
This failure is related to [BEAM-1043] and expected fixed by the now-merged [#1418]. [BEAM-1043]: https://issues.apache.org/jira/browse/BEAM-1043 [#1418]: https://github.com/apache/incubator-beam/pull/1418 On Wed, Nov 23, 2016 at 9:24 AM, Davor Bonaci wrote: > The dependency analysis seem to h

Re: [DISCUSS] Graduation to a top-level project

2016-11-22 Thread Kenneth Knowles
+1 !!! I especially love how the diversity of the community has contributed to the conceptual growth and quality of Beam. I can't wait for more! On Tue, Nov 22, 2016 at 11:22 AM, Thomas Groh wrote: > +1 > > It's been a thrilling experience thus far, and I'm excited for the future. > > On Tue, N

Re: Jenkins skipping PreCommit for a PR caused build failures on master.

2016-11-15 Thread Kenneth Knowles
rat-maven-plugin/1.0-alpha-3 > > Dan > > On Wed, Nov 16, 2016 at 12:22 AM, Kenneth Knowles > wrote: > > > I honestly do not understand what is going on with our RAT set up. I'd > also > > love some docs on it. > > > > Jenkins fails on https://gi

Re: Jenkins skipping PreCommit for a PR caused build failures on master.

2016-11-15 Thread Kenneth Knowles
am/pull/1364> #1332 ><https://github.com/apache/incubator-beam/pull/1332> >- Jenkins outage skipped (at least 1) PreCommit execution. >- A simple mvn install/verify won't execute checkstyle anymore, use: >"mvn -Prelease clean verify" >- Ken

Re: Configuring Jenkins

2016-11-15 Thread Kenneth Knowles
Awesome. This is a dramatic improvement. On Tue, Nov 15, 2016 at 8:52 AM, Amit Sela wrote: > Sweet! Versioning changes in a visible way can save a lot of pain.. > > Thanks Davor! > > On Tue, Nov 15, 2016, 18:47 Robert Bradshaw > wrote: > > > This is great; thanks for doing this! > > > > On Tue,

Re: Batcher DoFn

2016-11-14 Thread Kenneth Knowles
Hi Josh, I think you probably mean something like buffering elements in a field on the DoFn, emitting batches as appropriate, and emitting the remainder in finishBundle. Unfortunately there are two issues: - in the presence of windowing the DoFn might be invoked in different windows, so you'll

Re: Introduction + contributing to docs

2016-11-11 Thread Kenneth Knowles
Welcome! It is great to witness the website really coming together. On Fri, Nov 11, 2016 at 12:35 PM, Amit Sela wrote: > Welcome Melissa! > > On Fri, Nov 11, 2016, 22:31 Jean-Baptiste Onofré wrote: > > > Hi Melissa, > > > > welcome aboard !! > > > > Regards > > JB > > > > On 11/11/2016 08:11 PM

Re: [PROPOSAL] Merge apex-runner to master branch

2016-11-11 Thread Kenneth Knowles
OK, I believe enough time has passed, and enough +1s, with caveats addressed agreeably, that we have reach consensus on this. LGTM! I'll limit technical details to the PR. On Fri, Nov 11, 2016 at 11:09 AM, Robert Bradshaw < rober...@google.com.invalid> wrote: > Thanks, David! +1 to getting this i

Re: [jira] [Created] (BEAM-961) CountingInput could have starting number

2016-11-10 Thread Kenneth Knowles
dive into the codebase. On Thu, Nov 10, 2016 at 1:23 PM, Dan Halperin wrote: > Why not support this in a follow-on pardo that shifts the range? > > On Thu, Nov 10, 2016 at 1:22 PM, Kenneth Knowles (JIRA) > wrote: > >> Kennet

Re: [DISCUSS] Change "RunnableOnService" To A More Intuitive Name

2016-11-10 Thread Kenneth Knowles
lly like to see is automatic derivation of the capability > > matrix from an extended Runner Test Suite. (As outlined in Thomas' doc). > > > > On Wed, 9 Nov 2016 at 21:42 Kenneth Knowles > > wrote: > > > > > Huge +1 to this. > > > > > >

Re: [DISCUSS] Change "RunnableOnService" To A More Intuitive Name

2016-11-09 Thread Kenneth Knowles
Huge +1 to this. The two categories I care most about are: 1. Tests that need a runner, but are testing the other "thing under test"; today this is NeedsRunner. 2. Tests that are intended to test a runner; today this is RunnableOnService. Actually the lines are not necessary clear between them,

Re: SBT/ivy dependency issues

2016-11-09 Thread Kenneth Knowles
Hi Abbass, Seeing the output from `sbt dependency-tree` from the sbt-dependency-graph plugin [1] might help. (caveat: I did not try this out; I don't know the state of maintenance) Kenn [1] https://github.com/jrudolph/sbt-dependency-graph On Wed, Nov 9, 2016 at 6:33 AM, Jean-Baptiste Onofré wr

Re: PCollection to PCollection Conversion

2016-11-09 Thread Kenneth Knowles
;> for various reasons are not a part of the Apache Spark project: > >> https://spark-packages.org/. > >> > >> Maybe a "common-transformations" package would serve both users quick > >> ramp-up and ease-of-use while keeping Beam more "enabling&q

Re: [PROPOSAL] Merge apex-runner to master branch

2016-11-09 Thread Kenneth Knowles
Hi Thomas, Very good point about establishing more clear definitions of the roles mentioned in the guidelines. Let's discuss in a separate thread. Kenn On Tue, Nov 8, 2016 at 1:03 PM, Thomas Weise wrote: > Thanks for the support. It may be helpful to describe the roles of > "maintainer" and "s

Re: PCollection to PCollection Conversion

2016-11-08 Thread Kenneth Knowles
It seems useful for small scale debugging / demoing to have Dump.toString(). I think it should be named to clearly indicate its limited scope. Maybe other stuff could go in the Dump namespace, but "Dump.toJson()" would be for humans to read - so it should be pretty printed, not treated as a machine

Re: [PROPOSAL] Merge apex-runner to master branch

2016-11-08 Thread Kenneth Knowles
+1, with enthusiasm. On Tue, Nov 8, 2016 at 9:16 AM, Davor Bonaci wrote: > +1 > > I'd treat this as an official vote on this procedural matter. > > On Tue, Nov 8, 2016 at 6:55 AM, Mukul Jain wrote: > > > +1 > > > > Awesome work Thomas! More runner choices the better. > > > > Best > > Mukul > >

Re: Verify a new Runner

2016-11-07 Thread Kenneth Knowles
Hi Zhixin, I would love to help you out with this. One of the best ways to test your runner is to enable the "RunnableOnService" test suite in the core SDK. Here is an example of the configuration for the Flink runner: https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/pom.

Re: Contributing to Beam docs

2016-11-03 Thread Kenneth Knowles
This is great. These menus seem really intuitive for finding what you need. I especially like the clarity in Get Started and Documentation. A pretty big challenge, since we have runners and SDKs that all need to be called out prominently in order to let users know what Beam is about. I had ~3 th

Re: PAssert.GroupedGlobally defaults to a single empty Iterable.

2016-11-02 Thread Kenneth Knowles
rror <0> but expected <100>. > Running ungracefully seemed to come back and bite me in the past, so I'm > trying to avoid it. > > Thoughts ? > > On Wed, Nov 2, 2016 at 6:00 PM Dan Halperin > wrote: > > (Also: I meant "tests will [incorrectly] pass s

Re: PAssert.GroupedGlobally defaults to a single empty Iterable.

2016-11-02 Thread Kenneth Knowles
The iterable is the entirety of the contents of the PCollection. So empty iterable -> empty PCollection. It is actually main purpose/complexity in this transform to make sure it is non-empty, because otherwise downstream asserts do not run. On Wed, Nov 2, 2016 at 5:20 AM Amit Sela wrote: > I've

Re: PAssert.GroupedGlobally defaults to a single empty Iterable.

2016-11-02 Thread Kenneth Knowles
FWIW if the runner is set up properly the tests will still fail with a timeout waiting for the assertion aggregators to reach expected values. Unfortunately we haven't yet centralized this functionality into TestPipeline or thereabouts. On Wed, Nov 2, 2016 at 8:56 AM Dan Halperin wrote: > +Kenn

Re: Why does `Combine.perKey(SerializableFunction)` require same input and output type

2016-10-31 Thread Kenneth Knowles
Manu, I think your critique about user interface clarity is valid. CombineFn conflates a few operations and is not that clear about what it is doing or why. You seem to be concerned about CombineFn versus SerializableFunction constructors for the Combine family of transforms. I thought I'd respond

Re: migrating gearpump-runner to new DoFn fails with NotSerializableException

2016-10-30 Thread Kenneth Knowles
Hi Manu, That class is generated by DoFnInvokers, which generates bytecode to efficiently execute a DoFn. It should not be part of the serialized payload, but should be instantiated on the service/worker/etc. If you are trying to serialize a DoFnInvoker, then my recommendation is to serialize only

Re: [DISCUSS] Merging master -> feature branch

2016-10-27 Thread Kenneth Knowles
In the spirit of explicitly summarizing and concluding threads on list: I think we have affirmative consensus to go for it when a downstream integration is completely conflict-free and fixup-free. On Thu, Oct 27, 2016 at 12:43 PM Robert Bradshaw wrote: > My concern was mostly about what to do in

Re: [DISCUSS] Using Verbs for Transforms

2016-10-25 Thread Kenneth Knowles
51 PM Kenneth Knowles wrote: > I'd prefer to keep the vote focused on this rename, not a general policy. > > On Tue, Oct 25, 2016 at 10:26 PM Jean-Baptiste Onofré > wrote: > > Yes I would start a formal vote with the three proposals: descriptive > verb, adjective, verb

Re: [DISCUSS] Using Verbs for Transforms

2016-10-25 Thread Kenneth Knowles
I'd prefer to keep the vote focused on this rename, not a general policy. On Tue, Oct 25, 2016 at 10:26 PM Jean-Baptiste Onofré wrote: > Yes I would start a formal vote with the three proposals: descriptive > verb, adjective, verbs + adjective. > > Regards > JB > > ⁣​ > > On Oct 26, 2016, 07:16,

Re: Apex runner integration tests

2016-10-25 Thread Kenneth Knowles
I've commented on a PR but also want to respond here. In the precommit, we run https://builds.apache.org/job/beam_PreCommit_MavenVerify/ which uses -Pjenkins-precommit to select very few integration tests. It should just be unit tests and integration tests based on our examples. This catches the b

Re: [VOTE] Release 0.3.0-incubating, release candidate #1

2016-10-25 Thread Kenneth Knowles
+1 (binding) On Tue, Oct 25, 2016 at 5:26 PM Dan Halperin wrote: > My reading of the LEGAL threads is that since we are not including (shading > or bundling) the ASL-licensed code we are fine to distribute kinesis-io > module. This was the original conclusion that LEGAL-198 got to, and that > th

[DISCUSS] Merging master -> feature branch

2016-10-25 Thread Kenneth Knowles
Hi all, While collaborating on the apex-runner branch, the issue of how best to continuously merge master into the feature branch came up. IMO it differs somewhat from normal commits in two notable ways: 1. Modulo fix-ups, it is actually not adding any new code to the overall codebase, so reviews

Re: The Availability of PipelineOptions

2016-10-25 Thread Kenneth Knowles
In the spirit of some recent conversations about tracking proposals like this, are there JIRAs you can [file and then] mention on this thread? On Tue, Oct 25, 2016 at 2:07 PM Kenneth Knowles wrote: > Yea +1. Definitely a real prerequisite to a true runner-independent graph. > > On Tu

Re: The Availability of PipelineOptions

2016-10-25 Thread Kenneth Knowles
Yea +1. Definitely a real prerequisite to a true runner-independent graph. On Tue, Oct 25, 2016 at 1:24 PM Amit Sela wrote: > +1 > > On Tue, Oct 25, 2016 at 8:43 PM Robert Bradshaw > > wrote: > > > +1 > > > > On Tue, Oct 25, 2016 at 7:26 AM, Thomas Weise wrote: > > > +1 > > > > > > > > > On Tu

Re: [DISCUSS] Using Verbs for Transforms

2016-10-24 Thread Kenneth Knowles
The precedent that we use verbs has many exceptions. We have ApproximateQuantiles, Values, Keys, WithTimestamps, and I would even include Sum (at least when I read it). Historical note: the predilection towards verbs is from the Google Style Guide for Java method names

Re: [ANNOUNCEMENT] New committers!

2016-10-21 Thread Kenneth Knowles
Huzzah! I've personally enjoyed working together, and I am glad to extend this acknowledgement and welcome this addition to the Beam community. Kenn On Fri, Oct 21, 2016 at 3:18 PM Davor Bonaci wrote: > Hi everyone, > Please join me and the rest of Beam PPMC in welcoming the following > contri

Re: Placement of temporary files by FileBasedSink

2016-10-20 Thread Kenneth Knowles
gh. > > > On Thu, Oct 20, 2016 at 10:14 AM Robert Bradshaw > wrote: > > > On Thu, Oct 20, 2016 at 9:58 AM, Kenneth Knowles > > > wrote: > > > I like the spirit of proposal #1 for addressing the critical > duplication > > > problem, though as Dan

Re: Start of release 0.3.0-incubating

2016-10-20 Thread Kenneth Knowles
Aljoscha, I'm very interested in hearing how easy it is, or how fast we think it could get, from your perspective as first time release manager. The more frequent releases we have (eventually minor or patch version only) the less these concerns impact users. On Thu, Oct 20, 2016, 10:26 Jesse Ander

Re: Release Guide

2016-10-20 Thread Kenneth Knowles
This is really nice. Very readable and streamlined. On Thu, Oct 20, 2016 at 7:44 AM Aljoscha Krettek wrote: > Hi, > thanks for taking the time and writing this extensive doc! > > If no-one is against this I would like to be the release manager for the > next (0.3.0-incubating) release. I would w

Re: Placement of temporary files by FileBasedSink

2016-10-20 Thread Kenneth Knowles
I like the spirit of proposal #1 for addressing the critical duplication problem, though as Dan points out the logic to choose a related but collision-free name might be slightly more complex. It is a nice bonus that it addresses the less critical issues and improves usability for manual inspectio

Re: [DISCUSS] Sources and Runners

2016-10-19 Thread Kenneth Knowles
I wanted to pull out the sub-thread that isn't about testing, parapharased: Amit: "Dan laid out these points: readers should return ASAP, runners may poll as they see fit [including quickly if they think the reader is in start-up time], runners need to be OK with startup delay" Raghu: "What is the

Re: [KUDOS] Contributed runner: Apache Apex!

2016-10-17 Thread Kenneth Knowles
*I would like to :-) On Mon, Oct 17, 2016 at 9:51 AM Kenneth Knowles wrote: > Hi all, > > I would to, once again, call attention to a great addition to Beam: a > runner for Apache Apex. > > After lots of review and much thoughtful revision, pull request #540 has > been merg

[KUDOS] Contributed runner: Apache Apex!

2016-10-17 Thread Kenneth Knowles
Hi all, I would to, once again, call attention to a great addition to Beam: a runner for Apache Apex. After lots of review and much thoughtful revision, pull request #540 has been merged to the apex-runner feature branch today. Please do take a look, and help us put the finishing touches on it to

Re: [PROPOSAL] State and Timers for DoFn (aka per-key workflows)

2016-10-14 Thread Kenneth Knowles
n-Baptiste Onofré wrote: > +1 > > It sounds very good. > > Regards > JB > > On 07/27/2016 05:20 AM, Kenneth Knowles wrote: > > Hi everyone, > > > > > > I would like to offer a proposal for a much-requested feature in Beam: > > Stateful

Re: Specifying type arguments for generic PTransform builders

2016-10-13 Thread Kenneth Knowles
48 PM, Eugene Kirpichov < > kirpic...@google.com.invalid> wrote: > > > In my original email, all FooBuilder's should be simply Foo. Sorry for > the > > confusion. > > > > On Thu, Oct 6, 2016 at 3:08 PM Kenneth Knowles > > wrote: > > >

Re: [PROPOSAL] Splittable DoFn - Replacing the Source API with non-monolithic element processing in DoFn

2016-10-12 Thread Kenneth Knowles
This is awesome. Couple of comments on follow up ideas. On Wed, Oct 12, 2016 at 5:56 PM Eugene Kirpichov wrote: > - It adds a mostly runner-agnostic expansion of the ParDo transform for a > splittable DoFn, with one runner-specific primitive transform that needs to > be overridden by every runne

Re: Simplifying User-Defined Metrics in Beam

2016-10-12 Thread Kenneth Knowles
Correction: In my eagerness to see the end of aggregators, I mistook the intention. Both A and B leave aggregators in place until there is a replacement. In which case, I am strongly in favor of B. As soon as we can remove aggregators, I think we should. On Wed, Oct 12, 2016 at 10:48 AM Kenneth

Re: Simplifying User-Defined Metrics in Beam

2016-10-12 Thread Kenneth Knowles
Huzzah! This is IMO a really great change. I agree that we can get something in to allow work to continue, and improve the API as we learn. On Wed, Oct 12, 2016 at 10:20 AM Ben Chambers wrote: > 3. One open question is what to do with Aggregators. In the doc I mentioned that long term I'd like

Re: Introducing a Redistribute transform

2016-10-11 Thread Kenneth Knowles
On Tue, Oct 11, 2016 at 10:56 AM Eugene Kirpichov wrote: > Yeah, I'm starting to lean towards removing Redistribute.byKey() from the > public API - because it only makes sense for getting access to per-key > state, and 1) we don't have it yet and 2) runner should insert it > automatically - so th

Re: Introducing a Redistribute transform

2016-10-11 Thread Kenneth Knowles
On Mon, Oct 10, 2016 at 1:38 PM Eugene Kirpichov wrote: > The transform, the way it's implemented, actually does several things at > the same time and that's why it's tricky to document it. > This thread has actually made me less sure about my thoughts on this transform. I do know what the trans

Re: Specifying type arguments for generic PTransform builders

2016-10-06 Thread Kenneth Knowles
Mostly my thoughts are the same as Robert's. Use #3 whenever possible, fallback to #1 otherwise, but please consider using informative names for your methods in all cases. #1 GBK.create(): IMO this pattern is best only for transforms where withBar is optional or there is no such method, as in GBK.

Re: [PROPOSAL] Introduce review mailing list and provide update on open discussion

2016-10-06 Thread Kenneth Knowles
+1 to rev...@beam.incubator.apache.org if it is turnkey for infra to set up, aka points 1 and 2. Even though I would not personally read it via email, getting the information in yet another format and infrastructure (and stewardship) is valuable for search, archival, and supporting diverse work st

Re: [PROPOSAL] New Beam website design?

2016-10-05 Thread Kenneth Knowles
Just because the thread got bumped... I kind of miss the old bucket of technical docs. They aren't user-facing, but I used it quite a lot. Perhaps instead of deleting it, move from "Learn" to "Contribute" or bury it somewhere near the bottom of the contributors' guide? On Wed, Oct 5, 2016 at 11:21

Re: [REMINDER] Technical discussion on the mailing list

2016-10-05 Thread Kenneth Knowles
This is a great idea. And it produces many easy starter tickets! :-) On Wed, Oct 5, 2016 at 4:51 AM Jean-Baptiste Onofré wrote: > Hi team, > > I would like to excuse myself to have forgotten to discuss and share with > you a technical point and generally speaking do a small reminder. > > When we

Re: Apex Runner support for View.CreatePCollectionView

2016-09-27 Thread Kenneth Knowles
llection).empty(); > > Is there a good place to look for a basic understanding of PAssert and what > the runner needs to support? > > Thanks, > Thomas > > > > On Thu, Sep 15, 2016 at 11:51 AM, Kenneth Knowles > wrote: > > > Hi Thomas, > > > > The

Re: Apex Runner support for View.CreatePCollectionView

2016-09-15 Thread Kenneth Knowles
Hi Thomas, The side inputs 1-pager is a forward-looking document for the design of side inputs in Beam once the portability layers are completed. The current SDK and implementations do not quite respect the same abstraction boundaries, even though they are similar. Here are some specifics about t

Re: About Finishing Triggers

2016-09-14 Thread Kenneth Knowles
Caveat: I want to emphasize that I don't have a specific proposal. I haven't thought through enough details to consider a proposal, or you would have seen it already :-) On Sep 14, 2016 5:14 AM, "Aljoscha Krettek" wrote: > > Hi, > I had a chat with Kenn at Flink Forward and he did an off-hand rem

Re: Remove legacy import-order?

2016-08-24 Thread Kenneth Knowles
+1 to import order I don't care about actually enforcing formatting, but would add it to IDE tips and just make it an "OK topic for code review". Enforcing it would result in obscuring a lot of history for who to talk to about pieces of code. And by the way there is a recent build of the IntelliJ

Re: Configuring IntelliJ to enforce checkstyle rules

2016-08-24 Thread Kenneth Knowles
Nice step-by-step. +1 to adding tips for particular IDEs in the contribution guide. On Wed, Aug 24, 2016 at 7:48 AM, Jean-Baptiste Onofré wrote: > Hi Stas, > > Thanks for sharing ! > > As discussed with Amit on Hangout (and indirectly with you ;)), it's what > I'm using in my config. > > Some s

Re: DoFN Lamdba

2016-08-09 Thread Kenneth Knowles
There are two bits here, I think. 1. "Map" from (InputT, W) -> OutputT makes sense to me. Likewise FlatMap. 2. ParDo.of() with capabilities analogous to DoFn could be useful in some cases. I'd start with #1. Kenn On Tue, Aug 9, 2016 at 8:23 AM, Lukasz Cwik wrote: > I was going to suggest to d

Re: [PROPOSAL] Website page or Jira to host all current proposal discussion and docs

2016-08-09 Thread Kenneth Knowles
es to the > model / SDK (this covers most of the 'yes' in your list, with the exception > of Pipeline#waitToFinish). > > Do you guys have ideas for other criteria ? (e.g. are new runners and DSLs > worth a BIP ?, or do Infrastructure issues deserve a BIP ?). > > Ismael

Re: [PROPOSAL] Website page or Jira to host all current proposal discussion and docs

2016-08-08 Thread Kenneth Knowles
+1 to the overall idea, though I would limit it to large and/or long-term proposals. I like: - JIRA for tracking: that's what it does best. - Google Docs for detailed commenting and revision - basically a wiki with easier commenting - Beam site page for process description and list of current

Re: [PROPOSAL] Having 2 Spark runners to support Spark 1 users while advancing towards better streaming implementation with Spark 2

2016-08-04 Thread Kenneth Knowles
+1 I definitely think it is important to support spark 1 and 2 simultaneously, and I agree that side-by-side seems the best way to do it. I'll refrain from commenting on the specific technical aspects of the two runners and focus just on the split: I am also curious about the answer to Dan's quest

Re: [PROPOSAL] Pipeline Runner API design doc

2016-08-02 Thread Kenneth Knowles
not much left to discuss on the plan > representation. To me it seems pretty straightforward what has to be in > there and that is already more or less in. The only real thing missing are > triggers but there isn't yet a discussion about how that is going to work > out, correct? >

Re: [PROPOSAL] State and Timers for DoFn (aka per-key workflows)

2016-07-29 Thread Kenneth Knowles
cial example but I imagine there could be > real-world cases where this plays a role. > > Do we have any ideas on mitigating those kinds of problems or will we rely > on users properly understanding that this could happen in their pipeline? > > Cheers, > Aljoscha > > On

Re: [PROPOSAL] A brand new DoFn

2016-07-28 Thread Kenneth Knowles
pecially like how it both cleans up the API and allows more > >> optimizations in the future, especially with side inputs and the > different > >> methods for emitting. > >> > >> On Wed, 27 Jul 2016 at 06:49 Jean-Baptiste Onofré > wrote: > >> >

Re: [DISCUSS] cluster infrastructure - resource manager - for on going tests

2016-07-28 Thread Kenneth Knowles
nn IMHO the common deployment is Kafka (running standalone, because it > > only works that way), which also requires Zookeeper (if I'm not mistaken) > > and YARN, which all runners should be able to run on. > > > > On Thu, 28 Jul 2016 at 18:36 Kenneth Knowles > > w

Re: Suggestion for Writing Sink Implementation

2016-07-28 Thread Kenneth Knowles
ent Sinks for > >> writing data into Cassandra/Titan DB. My immediate goal is to run it on > >> Flink Runner. > >> > >> > >> > >> Regards > >> Sumit Chawla > >> > >> > >> On Thu, Jul 28, 2016 at 11:56 AM, Ken

Re: Suggestion for Writing Sink Implementation

2016-07-28 Thread Kenneth Knowles
Hi Sumit, I see what has happened here, from that snippet you pasted from the Flink runner's code [1]. Thanks for looking into it! The Flink runner today appears to reject Write.Bounded transforms in streaming mode if the sink is not an instance of UnboundedFlinkSink. The intent of that code, I b

  1   2   >