Re: Confusing about the bouded naming of PubsubIO

2016-02-18 Thread Ben Chambers
Classes named "Bound" are used throughout the sdk to describe builders that are specified enough to be applied. It indicates that the required parameters have been bound. It is not related to whether the output PCollection is bounded or unbounded. On Thu, Feb 18, 2016, 7:42 AM bakey pan wrote: >

Design document for Static Display Data

2016-03-14 Thread Ben Chambers
Hi! The following document describes work that we're planning on doing to allow every steps in a pipeline to include more information about what is actually going on. The goal is to allow UIs and diagnostic tools to display details about what is happening inside each step by including details what

Re: Design document for Static Display Data

2016-03-15 Thread Ben Chambers
ent greatly describes the data display. Do you have any plan to > implement kind of "checkpoint"/alerting depending of some predicates on > the data (it's something that I had in mind for the Beam data > integration DSL) ? It's maybe the TRIGGER type ? > > Thanks again

Re: Renaming process: first step Maven coordonates

2016-03-21 Thread Ben Chambers
1. Regarding "java" as a module -- are we sure that other languages will be packaged using Maven as well? For instance, Python has its own ecosystem which likely doesn't play well with Python. 2. Using the literal "SNAPSHOT" as the qualifier has special meaning Maven -- it is newer than all other

Re: Renaming process: first step Maven coordonates

2016-03-21 Thread Ben Chambers
ating for a release (it's what I did in > the PR). Like this, the Maven standards are still valid. > > Regards > JB > > On 03/21/2016 06:20 PM, Ben Chambers wrote: > > 1. Regarding "java" as a module -- are we sure that other languages will > be > &g

Re: Draft Contribution Guide

2016-03-23 Thread Ben Chambers
My concern with that is we aren't making clear what constitutes "whenever possible". Could we more concretely define that (eg., "for example, when Github is down")? Were there specific cases that you had in mind? Otherwise, I worry about the ambiguity introduced and the possibility for different pe

Re: [PROPOSAL] Writing More Expressive Beam Tests

2016-03-25 Thread Ben Chambers
My only concern is that in the example, you first need to declare all the inputs, then the pipeline to be tested, then all the outputs. This can lead to tests that are hard to follow, since what you're really testing is an interleaving more like "When these inputs arrive, I get this output. Then wh

Re: [PROPOSAL] Writing More Expressive Beam Tests

2016-03-31 Thread Ben Chambers
On Mon, Mar 28, 2016 at 4:29 PM Robert Bradshaw wrote: > On Fri, Mar 25, 2016 at 4:28 PM, Ben Chambers > wrote: > > My only concern is that in the example, you first need to declare all the > > inputs, then the pipeline to be tested, then all the outputs. This can > lead

Re: Where's my PCollection.map()?

2016-05-31 Thread Ben Chambers
Nice post Robert! Thanks for writing up the rationale. On Fri, May 27, 2016 at 12:38 PM Robert Bradshaw wrote: > Hi all! > > One of the questions that often gets asked is why Beam has PTransforms > for everything instead of having methods on PCollection. This morning > I published a blog post ex

Re: DoFn Reuse

2016-06-08 Thread Ben Chambers
I think there is a difference: - If failure occurs after finishBundle() but before the consumption is committed, then the bundle may be reprocessed, which leads to duplicated calls to processElement() and finishBundle(). - If failure occurs after consumption is committed but before finishBundle(),

Re: DoFn Reuse

2016-06-08 Thread Ben Chambers
On Wed, Jun 8, 2016 at 10:29 AM Raghu Angadi wrote: > On Wed, Jun 8, 2016 at 10:13 AM, Ben Chambers > > wrote: > > > - If failure occurs after finishBundle() but before the consumption is > > committed, then the bundle may be reprocessed, which leads to duplicated >

Re: [VOTE] Release version 0.1.0-incubating

2016-06-09 Thread Ben Chambers
+1 (binding) Excited to see the first release underway! On Thu, Jun 9, 2016 at 8:48 AM Kenneth Knowles wrote: > +1 (binding) > > Confirmed that we can run pipelines on Dataflow. > > Looks good. Very exciting! > > > On Thu, Jun 9, 2016 at 8:16 AM, Jean-Baptiste Onofré > wrote: > > > Team work !

[DISCUSS] PTransform.named vs. named apply

2016-06-22 Thread Ben Chambers
Based on a recent PR (https://github.com/apache/incubator-beam/pull/468) I was reminded of the confusion around the use of .apply(transform.named(someName)) and .apply(someName, transform). This is one of things I’ve wanted to cleanup for a while. I’d like to propose a path towards removing this re

Re: [DISCUSS] PTransform.named vs. named apply

2016-06-23 Thread Ben Chambers
ore sense to name the application of a transform > > rather than the transform itself. (Still mulling on how best to do > > this with Python...) > > > > On Wed, Jun 22, 2016 at 9:27 PM, Jean-Baptiste Onofré > > wrote: > > > +1 > > > > > >

Re: [Proposal] Add waitToFinish(), cancel(), waitToRunning() to PipelineResult.

2016-07-21 Thread Ben Chambers
(Minor Issue: I'd propose waitUntilDone and waitUntilRunning rather than waitToRunning which reads oddly) The only reason to separate submission from waitUntilRunning would be if you wanted to kick off several pipelines in quick succession, then wait for them all to be running. For instance: Pipe

Re: [Proposal] Add waitToFinish(), cancel(), waitToRunning() to PipelineResult.

2016-07-21 Thread Ben Chambers
, which is the same amount of work to implement. Am I correct > > that everyone in this thread thinks this generality is just not the right > > thing for a user API? > > > > - This enum could probably use revision. I'd chose some combination of > >

Re: [PROPOSAL] Website page or Jira to host all current proposal discussion and docs

2016-08-07 Thread Ben Chambers
Would we use the same Jira to track the series of PRs implementing the proposal (if accepted) or would it be discussion only (possibly linked to the implementation tasks)? On Sun, Aug 7, 2016, 9:48 PM Frances Perry wrote: > I'm a huge fan of keeping all the details related to a topic in a releva

Re: OldDoFn - CounterSet replacement

2016-08-17 Thread Ben Chambers
Hi Thomas! On Tue, Aug 16, 2016 at 9:40 PM Thomas Weise wrote: > I'm trying to rebase a PR and adjust for the DoFn changes. > Can you elaborate on what you're trying to do (or send a link to the PR)? > CounterSet is gone and there is now AggregatorFactory and I'm looking to > fix an existing

Re: OldDoFn - CounterSet replacement

2016-08-17 Thread Ben Chambers
aseline and proper support for aggregators isn't part of it, it's > something I was planning to take up in subsequent round. > > Thomas > > > On Wed, Aug 17, 2016 at 8:14 AM, Ben Chambers > wrote: > > > Hi Thomas! > > > > On Tue, Aug 1

Remove legacy import-order?

2016-08-23 Thread Ben Chambers
When Beam was contributed it inherited an import order [1] that was pretty arbitrary. We've added org.apache.beam [2], but continue to use this ordering. Both Eclipse and IntelliJ default to grouping imports into alphabetic order. I think it would simplify development if we switched our checkstyle

Re: Remove legacy import-order?

2016-08-23 Thread Ben Chambers
sse Anderson > > wrote: > > > > > Please. That's the one that always trips me up. > > > > > > On Tue, Aug 23, 2016, 4:10 PM Ben Chambers > wrote: > > > > > > > When Beam was contributed it inherited an import order [1] that was >

Re: IntervalWindow toString()

2016-09-19 Thread Ben Chambers
I think this is using http://www.mathwords.com/i/interval_notation.htm to indicate that the interval includes the start time but not the end time. On Mon, Sep 19, 2016, 8:56 AM Jesse Anderson wrote: > The toString() to IntervalWindow starts with a square bracket and ends with > a parenthesis. Is

Simplifying User-Defined Metrics in Beam

2016-09-28 Thread Ben Chambers
I started looking at BEAM-147: “Rename Aggregator to [P]Metric”. Rather than renaming the existing concept I’d like to introduce Metrics as a simpler mechanism to provide information during pipeline execution (I have updated the issue accordingly). Here is what I'm thinking would lead to a simpler

Re: Simplifying User-Defined Metrics in Beam

2016-10-05 Thread Ben Chambers
make. As always, let me know if there are any questions or comments! -- Ben On Wed, Sep 28, 2016 at 5:05 PM Ben Chambers wrote: I started looking at BEAM-147: “Rename Aggregator to [P]Metric”. Rather than renaming the existing concept I’d like to introduce Metrics as a simpler mechanism

Re: Introducing a Redistribute transform

2016-10-11 Thread Ben Chambers
As Kenn points out, I think the nature of the Redistribute operation is to act as a hint (or requirement) to the runner that a certain distribution the elements is desirable. In a perfect this wouldn't be necessary because every runner would be able to do exactly the right thing. Looking at the dif

Re: Simplifying User-Defined Metrics in Beam

2016-10-12 Thread Ben Chambers
e. It will also allow us to revisit them from a clean slate. Thoughts? On Thu, Oct 6, 2016 at 5:41 AM Aljoscha Krettek wrote: Hi, I'm currently in holidays but I'll put some thought into this and give my comments once I get back. Aljoscha On Wed, Oct 5, 2016, 21:51 Ben Chambers

Re: Specifying type arguments for generic PTransform builders

2016-10-13 Thread Ben Chambers
This is also a good reason to avoid overly general names like "from", "create" and "of". Instead, the option should be ".fromQuery(String query)", so we can add ".fromTable(...)". On Thu, Oct 13, 2016 at 4:55 PM Dan Halperin wrote: > For #3 -- I think we should be VERY careful there. You need to

Re: Simplifying User-Defined Metrics in Beam

2016-10-19 Thread Ben Chambers
> > > > On Wed, Oct 12, 2016 at 10:48 AM Kenneth Knowles wrote: > > > >> Huzzah! This is IMO a really great change. I agree that we can get > >> something in to allow work to continue, and improve the API as we learn. > >> > >> On Wed, O

Re: [ANNOUNCEMENT] New committers!

2016-10-21 Thread Ben Chambers
Congrats. +3! On Fri, Oct 21, 2016 at 3:34 PM Kenneth Knowles wrote: > Huzzah! > > I've personally enjoyed working together, and I am glad to extend this > acknowledgement and welcome this addition to the Beam community. > > Kenn > > On Fri, Oct 21, 2016 at 3:18 PM Davor Bonaci wrote: > > > Hi

Re: [DISCUSS] Using Verbs for Transforms

2016-10-26 Thread Ben Chambers
I also like Distinct since it doesn't make it sound like it modifies any underlying collection. RemoveDuplicates makes it sound like the duplicates are removed, rather than a new PCollection without duplicates being returned. On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré wrote: > Agree. It

Re: [DISCUSS] ExecIO

2016-12-05 Thread Ben Chambers
One option would be to use the reflective DoFn approach to this. Imagine something like: public class MyExternalFn extends DoFn { @ProcessElement // Existence of ShellExecutor indicates the code shells out. public void processElement(ProcessContext c, ShellExecutor shell) { ... Futur

Re: [DISCUSS] ExecIO

2016-12-05 Thread Ben Chambers
y functions (or perhaps regular language-specific facilities like > java's ProcessBuilder). > > On Mon, Dec 5, 2016 at 1:21 PM Ben Chambers wrote: > > One option would be to use the reflective DoFn approach to this. Imagine > something like: > > public class MyExternalFn exte

Re: [DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

2016-12-07 Thread Ben Chambers
+1 -- This seems like the best option. It's a mechanical change, and the compiler will let users know it needs to be made. It will make the mistake much less common, and when it occurs it will be much clearer what is wrong. It would be great if we could make the mis-use a compiler problem or a pip

Re: [DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

2016-12-07 Thread Ben Chambers
gt; through in the same release. > > Dan > > On Thu, Dec 8, 2016 at 7:48 AM, Aljoscha Krettek > wrote: > > > +1 > > > > I've seen this mistake myself in some PRs. > > > > On Thu, 8 Dec 2016 at 06:10 Ben Chambers > > wrote: > > > &g

Re: [DISCUSS] ExecIO

2016-12-08 Thread Ben Chambers
I think I agree with Robert (unless I'm misunderstanding his point). I think that the shell commands are going to be the most useful if it is possible to take the elements in an input PCollection, construct a shell command depending on those elements, and then execute it. I think doing so in a ful