Re: How to serialize/deserialize a Pipeline object?

2016-12-21 Thread Kenneth Knowles
integration test without a Streams install? > > Choosing an intermediate representation that can be serialized and > sent to a cloud service (where it is then translated into the actual > implementation representation) is a fine solution. In fact that's what > Dataflow itself does. > >

Re: How to serialize/deserialize a Pipeline object?

2016-12-21 Thread Kenneth Knowles
Hi Shen, I want to tell you (1) how things work today and (2) how we want them to be eventually. (1) So far, each runner translates the Pipeline to their own graph format before serialization, so we have not yet encountered this issue. (2) We intend to make a standard mostly-readable JSON

Jenkins seed job breakage

2016-12-19 Thread Kenneth Knowles
Hi all, The massive Jenkins breakage just now was me updating the seed job in unfriendly ways. It should be all cleared up now. Apologies for that. I'll be trying to come up with safer ways to validate such changes in the future. Kenn

Re: Build failed in Jenkins: beam_SeedJob_Main #43

2016-12-19 Thread Kenneth Knowles
Context: PR #1640 has its LGTM. Before committing it, I am ensuring it works by running the seed job against it. This _will_ change the other jobs if/when it succeeds, but it will change them to what they are about to be anyhow. The failure here is not substantive. I built against origin/pr/1640

Re: [VOTE] Release 0.4.0-incubating, release candidate #3

2016-12-17 Thread Kenneth Knowles
+1, as long as it is fine for the release to be signed by a PMC member other than the release manager. Otherwise need to replace the .asc file. Following [Apache release checklist]( http://incubator.apache.org/guides/releasemanagement.html#check-list): 1.1 Verified checksums & signature

Re: Jenkins build became unstable: beam_PostCommit_Java_MavenInstall #2124

2016-12-16 Thread Kenneth Knowles
Prior to this coming in, it was already mostly rolled forwards in Dataflow. It would actually be counterproductive to revert in Beam as that would re-introduce the same bug in reverse. Filed BEAM-1172 to prevent in the future. On Fri, Dec 16, 2016 at 2:18 PM, Apache Jenkins Server <

Re: [VOTE] Release 0.4.0-incubating, release candidate #1

2016-12-15 Thread Kenneth Knowles
t; I agree with Davor: I would prefer to cut a RC2. > > Regards > JB⁣​ > > On Dec 15, 2016, 20:06, at 20:06, Kenneth Knowles <k...@google.com.INVALID> > wrote: > >Agreed. I had though the issue in PR #1620 only affected Dataflow (in > >which > >case we cou

Re: [VOTE] Release 0.4.0-incubating, release candidate #1

2016-12-15 Thread Kenneth Knowles
Agreed. I had though the issue in PR #1620 only affected Dataflow (in which case we could address it in the service) but it now also affects the Flink runner, so it should be included in the release. On Thu, Dec 15, 2016 at 10:46 AM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: >

Re: New testSideInputsWithMultipleWindows and should DoFnRunner explode if DoFn contains a side input ?

2016-12-14 Thread Kenneth Knowles
Yes, this is a bug in SimplerDoFnRunner (or maybe some clarity on whether or not it owns this) not the Spark runner. FWIW the test is definitely correct, and runners-core has had this bug for a while. It is https://issues.apache.org/jira/browse/BEAM-1149 and I'm on it. On Wed, Dec 14, 2016 at

Re: Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Spark #409

2016-12-14 Thread Kenneth Knowles
This is still https://issues.apache.org/jira/browse/BEAM-1149. We recently added a test for it. The actual behavior has been broken for everyone for a while. It is half-fixed by Eugene K. (some DoFnRunners) but not all. On Wed, Dec 14, 2016 at 10:51 AM, Apache Jenkins Server <

Re: Jenkins build is still unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #1806

2016-12-13 Thread Kenneth Knowles
Failure in https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_RunnableOnService_Dataflow/1806/ is caused by https://github.com/apache/incubator-beam/pull/1541, which I am reverting. On Tue, Dec 13, 2016 at 3:16 PM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See

Re: [PROPOSAL] "IOChannelFactory" Redesign and Make it Configurable

2016-12-13 Thread Kenneth Knowles
). > > any thoughts and suggestions are welcome. > > Thanks > -- > Pei > > --- > [1]: > https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id- > XJsVG3qel2lhdKTknmZ_7M/edit?disco=A30vtPU#heading=h.p3gc3colc2cs > > [2]: > https://docs.google.com/document/d/11TdPyZ9_

Re: Beam Tuple

2016-12-13 Thread Kenneth Knowles
If the scope is really just tuples, then supposing a user chooses to go with Apache Commons tuples or javatuples it seems that the problem to be solved is easily providing coders for common data types that are not part of Beam. I think we should address this anyhow. The scope of having a common

Re: Jenkins pre/postcommit increased from 35m to 60m+ on Friday

2016-12-12 Thread Kenneth Knowles
INFO] > > [INFO] BUILD SUCCESS > [INFO] > -------- > [INFO] Total time: 34:30 min > [INFO] Finished at: 2016-12-09T18:50:49+00:00 > [INFO] Fin

Re: examples-java8 tests running slow

2016-12-12 Thread Kenneth Knowles
Yes, they are a bit harder to get fine-tuned executions. But they should only be run in the integration-test phase, not with unit tests. Is this happening when you run them locally or in Jenkins? On Mon, Dec 12, 2016 at 5:06 PM, Manu Zhang wrote: > Sorry, they are tests

Jenkins pre/postcommit increased from 35m to 60m+ on Friday

2016-12-12 Thread Kenneth Knowles
Hi all, We have a huge Jenkins backlog, surely exacerbated by the fact that our test time (precommit and postcommit mvn install) has roughly doubled in the last few days. Here's the quick link to the trend: https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_MavenInstall/buildTimeTrend

Re: New DoFn and WindowedValue/WinowingInternals

2016-12-11 Thread Kenneth Knowles
You've got it right. My recommendations is to just directly implement it for the Spark runner. It will often actually clean things up a bit. Here's the analogous change for the Flink runner: https://github.com/apache/incubator-beam/pull/1435/files. With GABW, I tried going through the process of

Re: [DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

2016-12-08 Thread Kenneth Knowles
ere no > more major changes to be made, or that they were all "ready to go". > > Are there any? If so, we should block the next release. > > On Fri, Dec 9, 2016 at 1:58 AM, Kenneth Knowles <k...@google.com.invalid> > wrote: > > > Thanks all! This has been do

Re: [DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

2016-12-08 Thread Kenneth Knowles
16 at 2:40 PM Tyler Akidau <taki...@google.com.invalid> > > wrote: > > > > > +1 > > > > > > On Thu, Dec 8, 2016 at 1:10 PM Jean-Baptiste Onofré <j...@nanthrax.net> > > > wrote: > > > > > > > +1 > > > > > &g

[DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

2016-12-07 Thread Kenneth Knowles
Hi all, I want to bring up another major backwards-incompatible change before it is too late, to resolve [BEAM-438]. Summary: Leave PInput.apply the same but rename PTransform.apply to PTransform.expand. I have opened [PR #1538] just for reference (it took 30 seconds using IDE automated

Re: [PROPOSAL] "IOChannelFactory" Redesign and Make it Configurable

2016-12-06 Thread Kenneth Knowles
Thanks for the thorough answers. It all sounds good to me. On Tue, Dec 6, 2016 at 12:57 PM, Pei He <pe...@google.com.invalid> wrote: > Thanks Kenn for the feedback and questions. > > I responded inline. > > On Mon, Dec 5, 2016 at 7:49 PM, Kenneth Knowles <k...@goo

Re: Jenkins build is unstable: beam_PostCommit_Java_RunnableOnService_Dataflow #1730

2016-12-05 Thread Kenneth Knowles
The error message looks like a transient error, though it is easy to believe this change could cause a problem. I will keep a sharp eye on it. On Mon, Dec 5, 2016 at 4:21 PM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See

Re: PAssertTest#runExpectingAssertionFailure() and waitUntilFinish()

2016-12-05 Thread Kenneth Knowles
Hi Stas, This is something special to TestPipeline and the test configuration for a runner. If runExpectingAssertionFailure() does not succeed, then our whole suite of RunnableOnService tests is not going to work, because they all have an assumption that TestPipeline#run() waits until the

Jenkins precommit worker affinity

2016-11-30 Thread Kenneth Knowles
It appears that the new job beam_PreCommit_Java_MavenInstall has an affinity for Jenkins worker beam3 while workers beam1 and beam2 sit idle. Is this intentional? There seems to be a backlog of half a dozen builds.

Re: Questions about coders

2016-11-30 Thread Kenneth Knowles
On Wed, Nov 30, 2016 at 3:52 PM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: > Hello, > > Do we have anywhere a set of recommendations for developing new coders? I'm > confused by a couple of things: > > - Why are coders serialized by JSON serialization instead of by regular > Java

Re: Jenkins build is still unstable: beam_PostCommit_MavenVerify #1948

2016-11-30 Thread Kenneth Knowles
This is a Dataflow-specific linking error. I am investigating and proceeding with a temporary rollback. On Wed, Nov 30, 2016 at 3:06 PM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See > >

Re: Jenkins build became unstable: beam_Release_NightlySnapshot #249

2016-11-30 Thread Kenneth Knowles
This looks like it might have been the sort of thing that #1189 (just merged) will fix. On Tue, Nov 29, 2016 at 11:29 PM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See

Re: [DISCUSS] Graduation to a top-level project

2016-11-22 Thread Kenneth Knowles
+1 !!! I especially love how the diversity of the community has contributed to the conceptual growth and quality of Beam. I can't wait for more! On Tue, Nov 22, 2016 at 11:22 AM, Thomas Groh wrote: > +1 > > It's been a thrilling experience thus far, and I'm excited

Re: Batcher DoFn

2016-11-14 Thread Kenneth Knowles
Hi Josh, I think you probably mean something like buffering elements in a field on the DoFn, emitting batches as appropriate, and emitting the remainder in finishBundle. Unfortunately there are two issues: - in the presence of windowing the DoFn might be invoked in different windows, so you'll

Re: Introduction + contributing to docs

2016-11-11 Thread Kenneth Knowles
Welcome! It is great to witness the website really coming together. On Fri, Nov 11, 2016 at 12:35 PM, Amit Sela wrote: > Welcome Melissa! > > On Fri, Nov 11, 2016, 22:31 Jean-Baptiste Onofré wrote: > > > Hi Melissa, > > > > welcome aboard !! > > > >

Re: [PROPOSAL] Merge apex-runner to master branch

2016-11-11 Thread Kenneth Knowles
OK, I believe enough time has passed, and enough +1s, with caveats addressed agreeably, that we have reach consensus on this. LGTM! I'll limit technical details to the PR. On Fri, Nov 11, 2016 at 11:09 AM, Robert Bradshaw < rober...@google.com.invalid> wrote: > Thanks, David! +1 to getting this

Re: [jira] [Created] (BEAM-961) CountingInput could have starting number

2016-11-10 Thread Kenneth Knowles
into the codebase. On Thu, Nov 10, 2016 at 1:23 PM, Dan Halperin <dhalp...@google.com> wrote: > Why not support this in a follow-on pardo that shifts the range? > > On Thu, Nov 10, 2016 at 1:22 PM, Kenneth Knowles (JIRA) <j...@apache.org> > wrote: > >>

Re: [DISCUSS] Change "RunnableOnService" To A More Intuitive Name

2016-11-10 Thread Kenneth Knowles
> > +1 > > > > What I would really like to see is automatic derivation of the capability > > matrix from an extended Runner Test Suite. (As outlined in Thomas' doc). > > > > On Wed, 9 Nov 2016 at 21:42 Kenneth Knowles <k...@google.com.invalid> >

Re: SBT/ivy dependency issues

2016-11-09 Thread Kenneth Knowles
Hi Abbass, Seeing the output from `sbt dependency-tree` from the sbt-dependency-graph plugin [1] might help. (caveat: I did not try this out; I don't know the state of maintenance) Kenn [1] https://github.com/jrudolph/sbt-dependency-graph On Wed, Nov 9, 2016 at 6:33 AM, Jean-Baptiste Onofré

Re: PCollection to PCollection Conversion

2016-11-09 Thread Kenneth Knowles
aries that > >> for various reasons are not a part of the Apache Spark project: > >> https://spark-packages.org/. > >> > >> Maybe a "common-transformations" package would serve both users quick > >> ramp-up and ease-of-use while keeping Beam

Re: [PROPOSAL] Merge apex-runner to master branch

2016-11-09 Thread Kenneth Knowles
Hi Thomas, Very good point about establishing more clear definitions of the roles mentioned in the guidelines. Let's discuss in a separate thread. Kenn On Tue, Nov 8, 2016 at 1:03 PM, Thomas Weise wrote: > Thanks for the support. It may be helpful to describe the roles of >

Re: PCollection to PCollection Conversion

2016-11-08 Thread Kenneth Knowles
It seems useful for small scale debugging / demoing to have Dump.toString(). I think it should be named to clearly indicate its limited scope. Maybe other stuff could go in the Dump namespace, but "Dump.toJson()" would be for humans to read - so it should be pretty printed, not treated as a

Re: Verify a new Runner

2016-11-07 Thread Kenneth Knowles
Hi Zhixin, I would love to help you out with this. One of the best ways to test your runner is to enable the "RunnableOnService" test suite in the core SDK. Here is an example of the configuration for the Flink runner:

Re: Contributing to Beam docs

2016-11-03 Thread Kenneth Knowles
This is great. These menus seem really intuitive for finding what you need. I especially like the clarity in Get Started and Documentation. A pretty big challenge, since we have runners and SDKs that all need to be called out prominently in order to let users know what Beam is about. I had ~3

Re: PAssert.GroupedGlobally defaults to a single empty Iterable.

2016-11-02 Thread Kenneth Knowles
I meant "tests will [incorrectly] pass silently") > > On Wed, Nov 2, 2016 at 8:57 AM, Kenneth Knowles <k...@google.com> wrote: > > > FWIW if the runner is set up properly the tests will still fail with a > > timeout waiting for the assertion aggregators to reach e

Re: PAssert.GroupedGlobally defaults to a single empty Iterable.

2016-11-02 Thread Kenneth Knowles
The iterable is the entirety of the contents of the PCollection. So empty iterable -> empty PCollection. It is actually main purpose/complexity in this transform to make sure it is non-empty, because otherwise downstream asserts do not run. On Wed, Nov 2, 2016 at 5:20 AM Amit Sela

Re: PAssert.GroupedGlobally defaults to a single empty Iterable.

2016-11-02 Thread Kenneth Knowles
FWIW if the runner is set up properly the tests will still fail with a timeout waiting for the assertion aggregators to reach expected values. Unfortunately we haven't yet centralized this functionality into TestPipeline or thereabouts. On Wed, Nov 2, 2016 at 8:56 AM Dan Halperin

Re: Why does `Combine.perKey(SerializableFunction)` require same input and output type

2016-10-31 Thread Kenneth Knowles
Manu, I think your critique about user interface clarity is valid. CombineFn conflates a few operations and is not that clear about what it is doing or why. You seem to be concerned about CombineFn versus SerializableFunction constructors for the Combine family of transforms. I thought I'd respond

Re: migrating gearpump-runner to new DoFn fails with NotSerializableException

2016-10-30 Thread Kenneth Knowles
Hi Manu, That class is generated by DoFnInvokers, which generates bytecode to efficiently execute a DoFn. It should not be part of the serialized payload, but should be instantiated on the service/worker/etc. If you are trying to serialize a DoFnInvoker, then my recommendation is to serialize

Re: [DISCUSS] Merging master -> feature branch

2016-10-27 Thread Kenneth Knowles
In the spirit of explicitly summarizing and concluding threads on list: I think we have affirmative consensus to go for it when a downstream integration is completely conflict-free and fixup-free. On Thu, Oct 27, 2016 at 12:43 PM Robert Bradshaw wrote: > My concern

Re: [DISCUSS] Using Verbs for Transforms

2016-10-25 Thread Kenneth Knowles
Kenneth Knowles <k...@google.com> wrote: > I'd prefer to keep the vote focused on this rename, not a general policy. > > On Tue, Oct 25, 2016 at 10:26 PM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > > Yes I would start a formal vote with the three proposals:

Re: [DISCUSS] Using Verbs for Transforms

2016-10-25 Thread Kenneth Knowles
I'd prefer to keep the vote focused on this rename, not a general policy. On Tue, Oct 25, 2016 at 10:26 PM Jean-Baptiste Onofré wrote: > Yes I would start a formal vote with the three proposals: descriptive > verb, adjective, verbs + adjective. > > Regards > JB > > ⁣​ > > On

Re: Apex runner integration tests

2016-10-25 Thread Kenneth Knowles
I've commented on a PR but also want to respond here. In the precommit, we run https://builds.apache.org/job/beam_PreCommit_MavenVerify/ which uses -Pjenkins-precommit to select very few integration tests. It should just be unit tests and integration tests based on our examples. This catches the

Re: [VOTE] Release 0.3.0-incubating, release candidate #1

2016-10-25 Thread Kenneth Knowles
+1 (binding) On Tue, Oct 25, 2016 at 5:26 PM Dan Halperin wrote: > My reading of the LEGAL threads is that since we are not including (shading > or bundling) the ASL-licensed code we are fine to distribute kinesis-io > module. This was the original conclusion that

[DISCUSS] Merging master -> feature branch

2016-10-25 Thread Kenneth Knowles
Hi all, While collaborating on the apex-runner branch, the issue of how best to continuously merge master into the feature branch came up. IMO it differs somewhat from normal commits in two notable ways: 1. Modulo fix-ups, it is actually not adding any new code to the overall codebase, so

Re: The Availability of PipelineOptions

2016-10-25 Thread Kenneth Knowles
In the spirit of some recent conversations about tracking proposals like this, are there JIRAs you can [file and then] mention on this thread? On Tue, Oct 25, 2016 at 2:07 PM Kenneth Knowles <k...@google.com> wrote: > Yea +1. Definitely a real prerequisite to a true runner-independ

Re: The Availability of PipelineOptions

2016-10-25 Thread Kenneth Knowles
Yea +1. Definitely a real prerequisite to a true runner-independent graph. On Tue, Oct 25, 2016 at 1:24 PM Amit Sela wrote: > +1 > > On Tue, Oct 25, 2016 at 8:43 PM Robert Bradshaw > > wrote: > > > +1 > > > > On Tue, Oct 25, 2016 at 7:26 AM,

Re: [ANNOUNCEMENT] New committers!

2016-10-21 Thread Kenneth Knowles
Huzzah! I've personally enjoyed working together, and I am glad to extend this acknowledgement and welcome this addition to the Beam community. Kenn On Fri, Oct 21, 2016 at 3:18 PM Davor Bonaci wrote: > Hi everyone, > Please join me and the rest of Beam PPMC in welcoming the

Re: Placement of temporary files by FileBasedSink

2016-10-20 Thread Kenneth Knowles
good enough. > > > On Thu, Oct 20, 2016 at 10:14 AM Robert Bradshaw > <rober...@google.com.invalid> wrote: > > > On Thu, Oct 20, 2016 at 9:58 AM, Kenneth Knowles <k...@google.com.invalid > > > > wrote: > > > I like the spirit of proposal #1 for addre

Re: Start of release 0.3.0-incubating

2016-10-20 Thread Kenneth Knowles
Aljoscha, I'm very interested in hearing how easy it is, or how fast we think it could get, from your perspective as first time release manager. The more frequent releases we have (eventually minor or patch version only) the less these concerns impact users. On Thu, Oct 20, 2016, 10:26 Jesse

Re: Release Guide

2016-10-20 Thread Kenneth Knowles
This is really nice. Very readable and streamlined. On Thu, Oct 20, 2016 at 7:44 AM Aljoscha Krettek wrote: > Hi, > thanks for taking the time and writing this extensive doc! > > If no-one is against this I would like to be the release manager for the > next

Re: [KUDOS] Contributed runner: Apache Apex!

2016-10-17 Thread Kenneth Knowles
*I would like to :-) On Mon, Oct 17, 2016 at 9:51 AM Kenneth Knowles <k...@google.com> wrote: > Hi all, > > I would to, once again, call attention to a great addition to Beam: a > runner for Apache Apex. > > After lots of review and much thoughtful revision, pull reques

[KUDOS] Contributed runner: Apache Apex!

2016-10-17 Thread Kenneth Knowles
Hi all, I would to, once again, call attention to a great addition to Beam: a runner for Apache Apex. After lots of review and much thoughtful revision, pull request #540 has been merged to the apex-runner feature branch today. Please do take a look, and help us put the finishing touches on it

Re: [PROPOSAL] State and Timers for DoFn (aka per-key workflows)

2016-10-14 Thread Kenneth Knowles
2:26 PM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > +1 > > It sounds very good. > > Regards > JB > > On 07/27/2016 05:20 AM, Kenneth Knowles wrote: > > Hi everyone, > > > > > > I would like to offer a proposal for a much-requested feature in Be

Re: Specifying type arguments for generic PTransform builders

2016-10-13 Thread Kenneth Knowles
i, Oct 7, 2016 at 4:48 PM, Eugene Kirpichov < > kirpic...@google.com.invalid> wrote: > > > In my original email, all FooBuilder's should be simply Foo. Sorry for > the > > confusion. > > > > On Thu, Oct 6, 2016 at 3:08 PM Kenneth Knowles <k...@google.com.invalid>

Re: Simplifying User-Defined Metrics in Beam

2016-10-12 Thread Kenneth Knowles
Correction: In my eagerness to see the end of aggregators, I mistook the intention. Both A and B leave aggregators in place until there is a replacement. In which case, I am strongly in favor of B. As soon as we can remove aggregators, I think we should. On Wed, Oct 12, 2016 at 10:48 AM Kenneth

Re: Simplifying User-Defined Metrics in Beam

2016-10-12 Thread Kenneth Knowles
Huzzah! This is IMO a really great change. I agree that we can get something in to allow work to continue, and improve the API as we learn. On Wed, Oct 12, 2016 at 10:20 AM Ben Chambers wrote: > 3. One open question is what to do with Aggregators. In the doc I

Re: [PROPOSAL] New Beam website design?

2016-10-05 Thread Kenneth Knowles
Just because the thread got bumped... I kind of miss the old bucket of technical docs. They aren't user-facing, but I used it quite a lot. Perhaps instead of deleting it, move from "Learn" to "Contribute" or bury it somewhere near the bottom of the contributors' guide? On Wed, Oct 5, 2016 at

Re: [REMINDER] Technical discussion on the mailing list

2016-10-05 Thread Kenneth Knowles
This is a great idea. And it produces many easy starter tickets! :-) On Wed, Oct 5, 2016 at 4:51 AM Jean-Baptiste Onofré wrote: > Hi team, > > I would like to excuse myself to have forgotten to discuss and share with > you a technical point and generally speaking do a small

Re: Apex Runner support for View.CreatePCollectionView

2016-09-15 Thread Kenneth Knowles
Hi Thomas, The side inputs 1-pager is a forward-looking document for the design of side inputs in Beam once the portability layers are completed. The current SDK and implementations do not quite respect the same abstraction boundaries, even though they are similar. Here are some specifics about

Re: About Finishing Triggers

2016-09-14 Thread Kenneth Knowles
Caveat: I want to emphasize that I don't have a specific proposal. I haven't thought through enough details to consider a proposal, or you would have seen it already :-) On Sep 14, 2016 5:14 AM, "Aljoscha Krettek" wrote: > > Hi, > I had a chat with Kenn at Flink Forward and

Re: Remove legacy import-order?

2016-08-24 Thread Kenneth Knowles
+1 to import order I don't care about actually enforcing formatting, but would add it to IDE tips and just make it an "OK topic for code review". Enforcing it would result in obscuring a lot of history for who to talk to about pieces of code. And by the way there is a recent build of the

Re: Configuring IntelliJ to enforce checkstyle rules

2016-08-24 Thread Kenneth Knowles
Nice step-by-step. +1 to adding tips for particular IDEs in the contribution guide. On Wed, Aug 24, 2016 at 7:48 AM, Jean-Baptiste Onofré wrote: > Hi Stas, > > Thanks for sharing ! > > As discussed with Amit on Hangout (and indirectly with you ;)), it's what > I'm using in

Re: [PROPOSAL] Website page or Jira to host all current proposal discussion and docs

2016-08-09 Thread Kenneth Knowles
BIP: Changes to the > model / SDK (this covers most of the 'yes' in your list, with the exception > of Pipeline#waitToFinish). > > Do you guys have ideas for other criteria ? (e.g. are new runners and DSLs > worth a BIP ?, or do Infrastructure issues deserve a BIP ?). > > Ismael

Re: [PROPOSAL] Having 2 Spark runners to support Spark 1 users while advancing towards better streaming implementation with Spark 2

2016-08-04 Thread Kenneth Knowles
+1 I definitely think it is important to support spark 1 and 2 simultaneously, and I agree that side-by-side seems the best way to do it. I'll refrain from commenting on the specific technical aspects of the two runners and focus just on the split: I am also curious about the answer to Dan's

Re: [PROPOSAL] Pipeline Runner API design doc

2016-08-02 Thread Kenneth Knowles
y at this point, at > > https://github.com/apache/incubator-beam/pull/662. I'd love if you took > a > > look at the notes on the PR and briefly at the schema; I'll continue to > > evolve it according to current & future feedback. > > > > Kenn > >

Re: Suggestion for Writing Sink Implementation

2016-07-28 Thread Kenneth Knowles
Thanks for looking into it. I am currently trying to implement Sinks for > >> writing data into Cassandra/Titan DB. My immediate goal is to run it on > >> Flink Runner. > >> > >> > >> > >> Regards > >> Sumit Chawla > >> >

[PROPOSAL] A brand new DoFn

2016-07-26 Thread Kenneth Knowles
Hi all, I have a major new feature to propose: the next generation of DoFn. It sounds a bit grandiose, but I think it is the best way to understand the proposal. This is strongly motivated by the design for state and timers, aka "per-key workflows". Since the two features are separable and have

[PROPOSAL] State and Timers for DoFn (aka per-key workflows)

2016-07-26 Thread Kenneth Knowles
Hi everyone, I would like to offer a proposal for a much-requested feature in Beam: Stateful processing in a DoFn. Please check out and comment on the proposal at this URL: https://s.apache.org/beam-state This proposal includes user-facing APIs for persistent state and timers. Together,

Re: Jenkins build is still unstable: beam_PostCommit_RunnableOnService_SparkLocal #12

2016-07-25 Thread Kenneth Knowles
Looks like it didn't take. I don't think it can be done via the maven command line. I think you may need to put this into the section of the pom for it to get plumbed in the needed way. In searching about, I noticed that it is an internal system property, not documented (why not?), so we might

Re: Getting started with contribution

2016-07-23 Thread Kenneth Knowles
Hi Minudika, Happy to hear from you! We have labels in JIRA that you can look at to find starter tasks that interest you. The tags are "starter" and "newbie" and "easyfix". They all mean the same thing. Take a look and if you find something that sounds interesting, comment on the ticket and we

Re: [Proposal] Add waitToFinish(), cancel(), waitToRunning() to PipelineResult.

2016-07-21 Thread Kenneth Knowles
On Thu, Jul 21, 2016 at 1:42 PM, Robert Bradshaw < rober...@google.com.invalid> wrote: > > (Totally backwards incompatible, we could calls this p.launch() for > clarity, and maybe keep a run as run() { return > p.launch().waitUntilFinish(); }.) > I must say this reads really well.

Re: [Proposal] Add waitToFinish(), cancel(), waitToRunning() to PipelineResult.

2016-07-21 Thread Kenneth Knowles
e: > > > TestPipeline is probably the one runner that can be expected to block, > as > > > certainly JUnit tests and likely other tests will run the Pipeline, and > > > succeed, even if the PipelineRunner throws an exception. Luckily, this > > can > > > be add

Re: [Proposal] Add waitToFinish(), cancel(), waitToRunning() to PipelineResult.

2016-07-21 Thread Kenneth Knowles
I like this proposal. It makes pipeline.run() seem like a pretty normal async request, and easy to program with. It removes the implicit assumption in the prior design that main() is pretty much just "build and run a pipeline". The part of this that I care about most is being able to write a

[KUDOS] Contributed runner: Gearpump!

2016-07-20 Thread Kenneth Knowles
Hi all, I would like to call attention to a huge contribution to Beam: a runner for Apache Gearpump (incubating). The runner landed on the gearpump-runner feature branch today. Check it out! And contribute towards getting it ready for the master branch :-) Please join me in special

Re: [PROPOSAL] CoGBK as primitive transform instead of GBK

2016-07-20 Thread Kenneth Knowles
+1 I assume that the intent is for the semantics of both GBK and CoGBK to be unchanged, just swapping their status as primitives. This seems like a good change, with strictly positive impact on users and SDK authors, with only an extremely minor burden (doing an insertion of the provided

Re: [PROPOSAL] Pipeline Runner API design doc

2016-07-14 Thread Kenneth Knowles
ttps://github.com/apache/incubator-beam/pull/662. I'd love if you took a look at the notes on the PR and briefly at the schema; I'll continue to evolve it according to current & future feedback. Kenn On Wed, Mar 23, 2016 at 2:17 PM, Kenneth Knowles <k...@google.com> wrote: > Hi everyone,

Re: Improvements to issue/version tracking

2016-06-28 Thread Kenneth Knowles
+1 On Tue, Jun 28, 2016 at 12:06 AM, Jean-Baptiste Onofré wrote: > +1 > > Regards > JB > > > On 06/28/2016 01:01 AM, Davor Bonaci wrote: > >> Hi everyone, >> I'd like to propose a simple change in Beam JIRA that will hopefully >> improve our issue and version tracking -- to

Re: What is the "Keyed State" in the capability matrix?

2016-06-24 Thread Kenneth Knowles
Hi Shen, The row refers to the ability for a DoFn in a ParDo to access per-key (and window) state cells that persist beyond the lifetime of an element or bundle. This is a feature that was in the later stages of design when the Beam code was donated. Hence it a row in the graph, but even the Beam

Re: [DISCUSS] Beam data plane serialization tech

2016-06-22 Thread Kenneth Knowles
> > > On Fri, Jun 17, 2016 at 12:47 PM Aljoscha Krettek <aljos...@apache.org > > > > > wrote: > > > > > > > Hi, > > > > am I correct in assuming that the transmitted envelopes would mostly > > > > contain coder-serialized values?

Re: Running examples with different runners

2016-06-21 Thread Kenneth Knowles
To expand on the RunnableOnService test suggestion, here [1] is the commit from the Spark runner. You will get a lot more information if you can port this for your runner than you would from an example end-to-end test. Note that this just pulls in the tests from the core SDK. For testing with

Re: Using GroupAlsoByWindowViaWindowSetDoFn for stream of input element

2016-06-21 Thread Kenneth Knowles
Broadly, yes: This and other semantics-preserving transformations are (by definition) runner-independent and we have a home for them in mind. The place we would put them is in the nascent runners-core module, which is generally the place for utilities for helping to implement runners. An

Re: 0.1.0-incubating release

2016-06-07 Thread Kenneth Knowles
+1 Lovely. Very readable. The "-parent" artifacts are just leaked implementation details of our build configuration that no one should ever actually reference, right? Kenn On Tue, Jun 7, 2016 at 8:54 AM, Dan Halperin wrote: > +2! This seems most concordant with

Re: Fewer number of minor/trivial issues

2016-05-23 Thread Kenneth Knowles
We do already have a couple of issues labeled as "starter" for just this purpose. I don't care much about the actual name; there are different words people think of ("easy-win", "starter", "newbie", "low-hanging-fruit") so probably it would be useful to have a good Jira search linked from the

Re: [DISCUSS] Adding Some Sort of SideInputRunner

2016-04-28 Thread Kenneth Knowles
On Thu, Apr 28, 2016 at 1:26 AM, Aljoscha Krettek wrote: > Bump. > > I'm afraid this might have gotten lost during the conferences/summits. > > On Thu, 21 Apr 2016 at 13:30 Aljoscha Krettek wrote: > > > Ok, I'll try and start such a design. Before I can

Re: [DISCUSS] Adding Some Sort of SideInputRunner

2016-04-20 Thread Kenneth Knowles
Hi Aljoscha, Great idea. - The logic for matching up the windows is WindowFn#getSideInputWindow [1] - The SDK used to have something along the lines of what you describe [2] but we thought it was too runner-specific, directly referencing Dataflow details, and with a particular model of

Re: A question about windowed values

2016-04-13 Thread Kenneth Knowles
Good thread. Filed as https://issues.apache.org/jira/browse/BEAM-191. On Wed, Apr 13, 2016 at 10:08 AM, Amit Sela wrote: > First of all, Thanks for the detailed explanation! > > I can say that from my point of view (as a runner developer) this is > definitely confusing,

Re: TextIO.Read.Bound vs Create

2016-04-13 Thread Kenneth Knowles
This seems wrong. They should both be in the global window. I think your trouble is https://github.com/apache/incubator-beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java#L472 On Tue, Apr 12, 2016 at 9:43 PM, Amit Sela

Re: Add Beam up to https://analysis.apache.org?

2016-04-12 Thread Kenneth Knowles
les. > > HTH, > Nitin > [1] > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20INFRA%20AND%20text%20~%20%22Sonar%22 > > > > > On Wed, Mar 30, 2016 at 1:43 PM, Kenneth Knowles <k...@google.com.invalid> > wrote: > > > Something to look forward

Re: Add Beam up to https://analysis.apache.org?

2016-03-30 Thread Kenneth Knowles
ache TLPs only. > > In the meantime, following could be used for project stats for mails [1] > and git commits [2]. > > Nitin > [1] http://markmail.org/search/?q=Apache+Beam > [2] https://biterg.io > > > On Tue, Mar 29, 2016 at 10:09 AM, Kenneth Knowles <k...@google.c

Add Beam up to https://analysis.apache.org?

2016-03-29 Thread Kenneth Knowles
This site looks very fun, possibly enlightening. Not urgent at all, but is there just a bit to flip to get Beam added? Kenn

[PROPOSAL] Pipeline Runner API design doc

2016-03-23 Thread Kenneth Knowles
Hi everyone, Incorporating the feedback from the 1-pager I circulated a week ago, I have put together a concrete design document for the new API(s). https://docs.google.com/document/d/1bao-5B6uBuf-kwH1meenAuXXS0c9cBQ1B2J59I3FiyI/edit?usp=sharing I appreciate any and all feedback on the design.

Re: Capability matrix question

2016-03-23 Thread Kenneth Knowles
+1 to considering "metric" / PMetric / etc. On Wed, Mar 23, 2016 at 8:09 AM, Amit Sela wrote: > How about "PMetric" ? > > On Wed, Mar 23, 2016, 16:53 Frances Perry wrote: > >> Perhaps I'm unclear on what an “Aggregator” is. I assumed that a line

Re: Renaming process: first step Maven coordonates

2016-03-22 Thread Kenneth Knowles
+1 JB. If it works for other incubating projects, then I'm happy to proceed. On Mon, Mar 21, 2016 at 1:00 PM, Jean-Baptiste Onofré wrote: > Hi Ben, > > it works fine with Maven >= 3.2.x (current version is 3.3.9). > > Most of incubator projects use x.x.x-incubating-SNAPSHOT:

Re: Committer workflow

2016-03-21 Thread Kenneth Knowles
+1 for emphasizing the knowledge-sharing aspect of review. I think it is the most important for project health, and the most fun too! I love the chance to learn about a new piece of code (or learn how I messed up in my own code :-)​

  1   2   >