Re: [DISCUSS] cluster infrastructure - resource manager - for on going tests

2016-07-28 Thread Kenneth Knowles
Presumably we'll eventually also run additional services alongside (like Kafka) to have true integration tests for I/O connectors. What is the common deployment in this case? On Jul 28, 2016 06:35, "Amit Sela" wrote: > So what would be the preferred resource manager to test Flink on ? > > On Thu

Re: Build failed in Jenkins: beam_Release_NightlySnapshot #115

2016-07-27 Thread Kenneth Knowles
This is day 2 of proxy errors when deploying the nightly snapshot, right? Apologies if I missed a thread about the prior failure. Is anyone already looking at this? On Wed, Jul 27, 2016 at 12:03 AM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See < > https://builds.apache.org/job/

[PROPOSAL] A brand new DoFn

2016-07-26 Thread Kenneth Knowles
Hi all, I have a major new feature to propose: the next generation of DoFn. It sounds a bit grandiose, but I think it is the best way to understand the proposal. This is strongly motivated by the design for state and timers, aka "per-key workflows". Since the two features are separable and have

[PROPOSAL] State and Timers for DoFn (aka per-key workflows)

2016-07-26 Thread Kenneth Knowles
Hi everyone, I would like to offer a proposal for a much-requested feature in Beam: Stateful processing in a DoFn. Please check out and comment on the proposal at this URL: https://s.apache.org/beam-state This proposal includes user-facing APIs for persistent state and timers. Together, the

Re: Jenkins build is still unstable: beam_PostCommit_RunnableOnService_SparkLocal #12

2016-07-25 Thread Kenneth Knowles
Looks like it didn't take. I don't think it can be done via the maven command line. I think you may need to put this into the section of the pom for it to get plumbed in the needed way. In searching about, I noticed that it is an internal system property, not documented (why not?), so we might als

Re: [Discuss] Beam SDK (Java) providing a shaded jar as a dependency

2016-07-25 Thread Kenneth Knowles
The way I see this issue is that shading is a tool that we are using to imperfectly implement two kinds of dependencies: 1. Private dependencies, which are implementation details. 2. Public dependencies, which are observable through the API surface. The SDK and user are required to share the same

Re: Getting started with contribution

2016-07-23 Thread Kenneth Knowles
Hi Minudika, Happy to hear from you! We have labels in JIRA that you can look at to find starter tasks that interest you. The tags are "starter" and "newbie" and "easyfix". They all mean the same thing. Take a look and if you find something that sounds interesting, comment on the ticket and we c

Re: [Proposal] Add waitToFinish(), cancel(), waitToRunning() to PipelineResult.

2016-07-21 Thread Kenneth Knowles
On Thu, Jul 21, 2016 at 1:42 PM, Robert Bradshaw < rober...@google.com.invalid> wrote: > > (Totally backwards incompatible, we could calls this p.launch() for > clarity, and maybe keep a run as run() { return > p.launch().waitUntilFinish(); }.) > I must say this reads really well.

Re: [Proposal] Add waitToFinish(), cancel(), waitToRunning() to PipelineResult.

2016-07-21 Thread Kenneth Knowles
ected to block, > as > > > certainly JUnit tests and likely other tests will run the Pipeline, and > > > succeed, even if the PipelineRunner throws an exception. Luckily, this > > can > > > be added to TestPipeline.run(), which already has additional behavior >

Re: [Proposal] Add waitToFinish(), cancel(), waitToRunning() to PipelineResult.

2016-07-21 Thread Kenneth Knowles
I like this proposal. It makes pipeline.run() seem like a pretty normal async request, and easy to program with. It removes the implicit assumption in the prior design that main() is pretty much just "build and run a pipeline". The part of this that I care about most is being able to write a progr

[KUDOS] Contributed runner: Gearpump!

2016-07-20 Thread Kenneth Knowles
Hi all, I would like to call attention to a huge contribution to Beam: a runner for Apache Gearpump (incubating). The runner landed on the gearpump-runner feature branch today. Check it out! And contribute towards getting it ready for the master branch :-) Please join me in special congratulatio

Re: [PROPOSAL] CoGBK as primitive transform instead of GBK

2016-07-20 Thread Kenneth Knowles
+1 I assume that the intent is for the semantics of both GBK and CoGBK to be unchanged, just swapping their status as primitives. This seems like a good change, with strictly positive impact on users and SDK authors, with only an extremely minor burden (doing an insertion of the provided implemen

Re: [PROPOSAL] Pipeline Runner API design doc

2016-07-14 Thread Kenneth Knowles
oint, at https://github.com/apache/incubator-beam/pull/662. I'd love if you took a look at the notes on the PR and briefly at the schema; I'll continue to evolve it according to current & future feedback. Kenn On Wed, Mar 23, 2016 at 2:17 PM, Kenneth Knowles wrote: > Hi everyone, >

Re: Improvements to issue/version tracking

2016-06-28 Thread Kenneth Knowles
+1 On Tue, Jun 28, 2016 at 12:06 AM, Jean-Baptiste Onofré wrote: > +1 > > Regards > JB > > > On 06/28/2016 01:01 AM, Davor Bonaci wrote: > >> Hi everyone, >> I'd like to propose a simple change in Beam JIRA that will hopefully >> improve our issue and version tracking -- to actually use the "Fix

Re: How to control the parallelism when run ParDo on PCollection?

2016-06-24 Thread Kenneth Knowles
Hi Shen, It is completely up to the runner how to divide things into bundles: it is one item of work that should fail or succeed atomically. Bundling limits parallelism, but does not determine it. For example, a streaming execution may have many bundles over time as elements arrive, regardless of

Re: What is the "Keyed State" in the capability matrix?

2016-06-24 Thread Kenneth Knowles
Hi Shen, The row refers to the ability for a DoFn in a ParDo to access per-key (and window) state cells that persist beyond the lifetime of an element or bundle. This is a feature that was in the later stages of design when the Beam code was donated. Hence it a row in the graph, but even the Beam

Re: Scala DSL

2016-06-24 Thread Kenneth Knowles
My +1 goes to dsls/scio. It already has a cool name, so let's use it. And there might be other Scala-based DSLs. On Fri, Jun 24, 2016 at 8:39 AM, Ismaël Mejía wrote: > ​Hello everyone, > > Neville, thanks a lot for your contribution. Your work is amazing and I am > really happy that this scala i

Re: [DISCUSS] Beam data plane serialization tech

2016-06-22 Thread Kenneth Knowles
a Krettek > > > > wrote: > > > > > > > Hi, > > > > am I correct in assuming that the transmitted envelopes would mostly > > > > contain coder-serialized values? If so, wouldn't the header of an > > > envelope > > >

Re: Running examples with different runners

2016-06-21 Thread Kenneth Knowles
To expand on the RunnableOnService test suggestion, here [1] is the commit from the Spark runner. You will get a lot more information if you can port this for your runner than you would from an example end-to-end test. Note that this just pulls in the tests from the core SDK. For testing with othe

Re: Using GroupAlsoByWindowViaWindowSetDoFn for stream of input element

2016-06-21 Thread Kenneth Knowles
Broadly, yes: This and other semantics-preserving transformations are (by definition) runner-independent and we have a home for them in mind. The place we would put them is in the nascent runners-core module, which is generally the place for utilities for helping to implement runners. An optimizat

Re: [DISCUSS] Beam data plane serialization tech

2016-06-16 Thread Kenneth Knowles
(Apologies for the formatting) On Thu, Jun 16, 2016 at 12:12 PM, Kenneth Knowles wrote: > Hello everyone! > > We are busily working on a Runner API (for building and transmitting > pipelines) > and a Fn API (for invoking user-defined functions found within pipelines) > as >

[DISCUSS] Beam data plane serialization tech

2016-06-16 Thread Kenneth Knowles
Hello everyone! We are busily working on a Runner API (for building and transmitting pipelines) and a Fn API (for invoking user-defined functions found within pipelines) as outlined in the Beam technical vision [1]. Both of these require a language-independent serialization technology for interope

Re: [VOTE] Release version 0.1.0-incubating

2016-06-09 Thread Kenneth Knowles
+1 (binding) Confirmed that we can run pipelines on Dataflow. Looks good. Very exciting! On Thu, Jun 9, 2016 at 8:16 AM, Jean-Baptiste Onofré wrote: > Team work ! Special thanks to Davor and Dan ! And thanks to the entire > team: it's a major step forward (the first release is always the hard

Re: 0.1.0-incubating release

2016-06-07 Thread Kenneth Knowles
+1 Lovely. Very readable. The "-parent" artifacts are just leaked implementation details of our build configuration that no one should ever actually reference, right? Kenn On Tue, Jun 7, 2016 at 8:54 AM, Dan Halperin wrote: > +2! This seems most concordant with other Apache products and the m

[DESIGN DOC] Reference document for triggers in Beam

2016-05-30 Thread Kenneth Knowles
Hi everyone, In advance of my talk on triggers at Strata+Hadoop World, London, I have prepared a Beam version of my reference document for triggers. Abstract: This document explains the semantics of triggers from a somewhat formal perspective, and goes on to derive the constraints on their implem

Re: Fewer number of minor/trivial issues

2016-05-23 Thread Kenneth Knowles
We do already have a couple of issues labeled as "starter" for just this purpose. I don't care much about the actual name; there are different words people think of ("easy-win", "starter", "newbie", "low-hanging-fruit") so probably it would be useful to have a good Jira search linked from the contr

[1-Pager] Side Input Architecture

2016-05-13 Thread Kenneth Knowles
Hi all, Luke & I put together a 1-pager* with lots of diagrams that just goes into some more detail on the runner-independent side input story for the Beam Runner/Fn API. https://s.apache.org/beam-side-inputs-1-pager Here is the teaser line: "What happens when a user requests to side input

Re: Using Side Inputs to Join with Static Data Sets

2016-05-13 Thread Kenneth Knowles
I think it may help to unpack the override & expansion of View.asXYZ() in the DataflowPipelineRunner [1] and the InProcessPipelineRunner [2] Each of these does: 1. some preparation, perhaps 2. concatenate the side input PColl into a single iterable (there's a GBK here; triggering) 3. materialize

Re: add component tag to pull request title / commit comment

2016-05-12 Thread Kenneth Knowles
For the BEAM-12345 tag, we have only really been doing that for PR titles, not commits, right? Then as long as you can read the real title on https://github.com/apache/incubator-beam/pulls I'm happy, and for my screen layout I think as many as 150 chars would be fine. I will experiment with tagging

Re: [DISCUSS] Adding Some Sort of SideInputRunner

2016-05-03 Thread Kenneth Knowles
the DoFn) > while they both seem to be using the same StateInternals form the step > context. How does that work? > > Cheers, > Aljoscha > > On Thu, 28 Apr 2016 at 20:05 Kenneth Knowles > wrote: > > > On Thu, Apr 28, 2016 at 10:19 AM, Aljoscha Krettek > > wrot

Re: [DISCUSS] Adding Some Sort of SideInputRunner

2016-04-28 Thread Kenneth Knowles
On Thu, Apr 28, 2016 at 10:19 AM, Aljoscha Krettek wrote: > No worries :-) and thanks for the detailed answers! > > I still have one question, though: you wrote that "The side input is > considered ready when there has been any data output/added to the > PCollection that it is being read as a sid

Re: [DISCUSS] Adding Some Sort of SideInputRunner

2016-04-28 Thread Kenneth Knowles
On Thu, Apr 28, 2016 at 1:26 AM, Aljoscha Krettek wrote: > Bump. > > I'm afraid this might have gotten lost during the conferences/summits. > > On Thu, 21 Apr 2016 at 13:30 Aljoscha Krettek wrote: > > > Ok, I'll try and start such a design. Before I can start, I have a few > > questions about ho

Re: [DISCUSS] Adding Some Sort of SideInputRunner

2016-04-20 Thread Kenneth Knowles
Hi Aljoscha, Great idea. - The logic for matching up the windows is WindowFn#getSideInputWindow [1] - The SDK used to have something along the lines of what you describe [2] but we thought it was too runner-specific, directly referencing Dataflow details, and with a particular model of bufferin

Re: [PROPOSAL] Pipeline Runner API design doc

2016-04-18 Thread Kenneth Knowles
ests/commits. I'm quite > > interested in how this would go. It would also not allow user-written > > triggers anymore, correct? > > > > Cheers, > > Aljoscha > > > > On Thu, 24 Mar 2016 at 07:41 Jean-Baptiste Onofré > wrote: > > > >> H

Re: A question about windowed values

2016-04-13 Thread Kenneth Knowles
Good thread. Filed as https://issues.apache.org/jira/browse/BEAM-191. On Wed, Apr 13, 2016 at 10:08 AM, Amit Sela wrote: > First of all, Thanks for the detailed explanation! > > I can say that from my point of view (as a runner developer) this is > definitely confusing, especially discovering th

Re: A question about windowed values

2016-04-13 Thread Kenneth Knowles
It is fine to create a WindowedValue carrying no windows when it is a fully reified WindowedValue. It is when it becomes an element in a PCollection that a value must exist within some window. In a PCollection> you can have elements that do not *contain* any windows, but exist *within* some window

Re: TextIO.Read.Bound vs Create

2016-04-13 Thread Kenneth Knowles
This seems wrong. They should both be in the global window. I think your trouble is https://github.com/apache/incubator-beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java#L472 On Tue, Apr 12, 2016 at 9:43 PM, Amit Sela wrote: > Why inp

Re: PROPOSAL: Apache Beam (virtual) meeting: 05/11/2016 08:00 - 11:00 Pacific time

2016-04-12 Thread Kenneth Knowles
Either works for me. Thanks James! On Tue, Apr 12, 2016 at 11:31 AM, Amit Sela wrote: > Anytime works for me. > > On Tue, Apr 12, 2016, 21:24 Jean-Baptiste Onofré wrote: > > > Hi James, > > > > 5/4 works for me ! > > > > Thanks, > > Regards > > JB > > > > On 04/12/2016 05:05 PM, James Malone wr

Re: Add Beam up to https://analysis.apache.org?

2016-04-12 Thread Kenneth Knowles
itin > [1] > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20INFRA%20AND%20text%20~%20%22Sonar%22 > > > > > On Wed, Mar 30, 2016 at 1:43 PM, Kenneth Knowles > wrote: > > > Something to look forward to, then :-) > > > > But I also note that

Re: [PROPOSAL] Nightly builds by Jenkins

2016-04-05 Thread Kenneth Knowles
On Tue, Apr 5, 2016 at 1:57 AM, Maximilian Michels wrote: > > - Test coverage of pull requests (beam_PreCommit) > - Test coverage of the master and all other branches (beam_MavenVerify) > - A daily job that deploys artifacts to the snapshot repository > (beam_Nightly) > I like this. Everything he

Re: Add Beam up to https://analysis.apache.org?

2016-03-30 Thread Kenneth Knowles
t; In the meantime, following could be used for project stats for mails [1] > and git commits [2]. > > Nitin > [1] http://markmail.org/search/?q=Apache+Beam > [2] https://biterg.io > > > On Tue, Mar 29, 2016 at 10:09 AM, Kenneth Knowles > wrote: > > > This site look

Add Beam up to https://analysis.apache.org?

2016-03-29 Thread Kenneth Knowles
This site looks very fun, possibly enlightening. Not urgent at all, but is there just a bit to flip to get Beam added? Kenn

[PROPOSAL] Pipeline Runner API design doc

2016-03-23 Thread Kenneth Knowles
Hi everyone, Incorporating the feedback from the 1-pager I circulated a week ago, I have put together a concrete design document for the new API(s). https://docs.google.com/document/d/1bao-5B6uBuf-kwH1meenAuXXS0c9cBQ1B2J59I3FiyI/edit?usp=sharing I appreciate any and all feedback on the design.

Re: Capability matrix question

2016-03-23 Thread Kenneth Knowles
+1 to considering "metric" / PMetric / etc. On Wed, Mar 23, 2016 at 8:09 AM, Amit Sela wrote: > How about "PMetric" ? > > On Wed, Mar 23, 2016, 16:53 Frances Perry wrote: > >> Perhaps I'm unclear on what an “Aggregator” is. I assumed that a line such as the following: PColle

Re: Renaming process: first step Maven coordonates

2016-03-22 Thread Kenneth Knowles
+1 JB. If it works for other incubating projects, then I'm happy to proceed. On Mon, Mar 21, 2016 at 1:00 PM, Jean-Baptiste Onofré wrote: > Hi Ben, > > it works fine with Maven >= 3.2.x (current version is 3.3.9). > > Most of incubator projects use x.x.x-incubating-SNAPSHOT: > > > https://git1-

Re: Renaming process: first step Maven coordonates

2016-03-21 Thread Kenneth Knowles
Many issues have been raised here and I cannot tell the direction people are working now on the PR. So here is my current thinking, which may be a +1 to some things or may be different. 1. Versions - This seems odd: 0.1.0-incubating < 0.1.0-SNAPSHOT It mostly shouldn't matter but seems better

Re: Committer workflow

2016-03-21 Thread Kenneth Knowles
+1 for emphasizing the knowledge-sharing aspect of review. I think it is the most important for project health, and the most fun too! I love the chance to learn about a new piece of code (or learn how I messed up in my own code :-)​

Re: Revision to Pipeline Runner API: 1-Pager

2016-03-19 Thread Kenneth Knowles
Hi all, Since I haven't heard any major objections or suggestions in a little while, I'm going to go ahead and put together an actual design doc to share that fleshes out the proposed work a bit more. Kenn On Tue, Mar 15, 2016 at 8:28 AM, Kenneth Knowles wrote: > Hi everyone, &

Revision to Pipeline Runner API: 1-Pager

2016-03-15 Thread Kenneth Knowles
Hi everyone, I have put together a transient "1-pager" about revisions to the PipelineRunner API. The target audience is mostly SDK authors and pipeline runner authors. The goal of the doc is to scope the revision well. Please do comment on things that I missed, or things that aren't really pain

Sorry for un-fixed-up PR merge

2016-03-10 Thread Kenneth Knowles
I want to apologize for leaving fixup commits in a PR merge I just performed. I'm leaving as-is rather than mess about with `git push -f` to rewrite a prettier history. Just don't want anyone to think that I would normally go about like that. Kenn

Re: New beam website!

2016-03-07 Thread Kenneth Knowles
Thanks! Note that when someone clones for the first time, it will fail to checkout the default `master` branch. You must manually `git checkout asf-site`. I had a moment of confusion. If there is a technical reason not to use the master branch, the default can be set like so: $ git symbolic-r

Re: [Google Dataflow SDK] Build fails

2016-03-07 Thread Kenneth Knowles
egards, > Minudika > > Minudika Malshan > Undergraduate > Department of Computer Science and Engineering > University of Moratuwa. > > > > > On Mon, Mar 7, 2016 at 11:06 PM, Kenneth Knowles > wrote: > > > Yup, I'm checking it out. I have not reproduc

Re: [Google Dataflow SDK] Build fails

2016-03-07 Thread Kenneth Knowles
Yup, I'm checking it out. I have not reproduced it yet. On Mon, Mar 7, 2016 at 9:16 AM, Davor Bonaci wrote: > Minudika's issue is the following: > > [WARNING] Unused declared dependencies found: > [...] > [ERROR] Failed to execute > goal org.apache.maven.plugins:maven-dependency-plugin:2.8:analy

Reference document for lateness concept in Beam

2016-03-01 Thread Kenneth Knowles
Hi everyone, I have prepared a Beam version of our reference document for the concept of lateness. The target audience is runner implementers. The following link to the document is world-commentable: https://docs.google.com/document/d/12r7frmxNickxB5tbpuEh_n35_IJeVZn1peOrBrhhP6Y/edit?usp=sharing

Re: Apache Beam logo proposal

2016-02-18 Thread Kenneth Knowles
Love it! On Thu, Feb 18, 2016 at 7:38 AM, James Malone < jamesmal...@google.com.invalid> wrote: > Absolutely! > > Here is a link to the full proposed logo > < > https://drive.google.com/file/d/0B-IhJZh9Ab52MmlkWHk2bWQ3bW8/view?usp=sharing > >. > Here is the proposed condensed (icon) logo > < > ht

Re: Apache Beam blog

2016-02-12 Thread Kenneth Knowles
+1 and +1 to getting a snazzy logo :-) On Fri, Feb 12, 2016 at 10:14 AM, James Malone < jamesmal...@google.com.invalid> wrote: > Yes, absolutely. Especially once we have a logo. :) Right now it's probably > slightly gloomy-looking. > > On Fri, Feb 12, 2016 at 10:12 AM, Maximilian Michels > wrote

Re: status update

2016-02-12 Thread Kenneth Knowles
On Thu, Feb 11, 2016 at 8:53 AM, Maximilian Michels wrote: > As for the /develop branch, I would suggest to > make it mandatory to have it in a usable state at all times. > +1 If breakage is accidentally committed (as will happen) then a CTR rollback is a encouraged. Kenn

<    1   2