Re: TextIO binary file

2017-01-31 Thread Robert Bradshaw
On Tue, Jan 31, 2017 at 12:04 PM, Aviem Zur wrote: > +1 on what Stas said. > I think there is value in not having the user write a custom IO for a > protocol they use which is not covered by Beam IOs. Plus having them deal > with not only the encoding but also the IO part is

Re: Should you always have a separate PTransform class for a new transform?

2017-02-08 Thread Robert Bradshaw
chov < > > > > > > > kirpic...@google.com.invalid> wrote: > > > > I must admit I didn't quite > > > > understand the option of "implements CombiningTransform". > > > > > > > > > > On Tue, Feb 7, 2017 at 9:0

Re: TextIO binary file

2017-02-06 Thread Robert Bradshaw
cala#L1512 > > > > > < > > > > > > > > > https://github.com/apache/spark/blob/master/core/src/ > main/scala/org/apache/spark/rdd/RDD.scala#L1512 > > > > > > > > > > > > > > > The merit for something li

Re: Should you always have a separate PTransform class for a new transform?

2017-02-07 Thread Robert Bradshaw
On Tue, Feb 7, 2017 at 7:49 PM, Kenneth Knowles wrote: > I am +0.7 on this idea. My rationale is contained in this thread, but I > thought I would paraphrase it anyhow: > > "You automatically get all the features of Combine" / "If you add a feature > to Combine you have

Re: Should you always have a separate PTransform class for a new transform?

2017-02-07 Thread Robert Bradshaw
that non-pure combines (e.g. Count.perElement(), RemoveDuplicates(), etc. could implement this interface as well. There are some to-be-worked-out details, such as the fact that CombiningTransform can't extend PTransform as PTransform is an abstract class, not an interface, that might make this messier

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Robert Bradshaw
First off, let me say that a *correctly* batching DoFn is a lot of value, especially because it's (too) easy to (often unknowingly) implement it incorrectly. My take is that a BatchingParDo should be a PTransform that takes a DoFn, ? extends Iterable> as a parameter, as

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-27 Thread Robert Bradshaw
On Fri, Jan 27, 2017 at 6:55 AM, Etienne Chauchot <echauc...@gmail.com> wrote: > Hi Robert, > > Le 26/01/2017 à 18:17, Robert Bradshaw a écrit : >> >> First off, let me say that a *correctly* batching DoFn is a lot of >> value, especially because it's (too) easy to

Re: Conceptually, what are bundles?

2017-01-25 Thread Robert Bradshaw
Bundles are simply the unit of commitment (retry) in the Beam SDK. They're not really a model concept, but do leak from the implementation into the API as it's not feasible to checkpoint every individual process call, and this allows some state/compute/... to be safely amortized across elements

Re: Beam Fn API

2017-01-20 Thread Robert Bradshaw
Also, note that we can still support the "simple" case. For example, if the user supplies us with a jar file (as they do now) a runner could launch it as a subprocesses and communicate with it via this same Fn API or install it in a fixed container itself--the user doesn't *need* to know about

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Robert Bradshaw
On Thu, Jan 26, 2017 at 6:58 PM, Kenneth Knowles <k...@google.com.invalid> wrote: > On Thu, Jan 26, 2017 at 4:15 PM, Robert Bradshaw < > rober...@google.com.invalid> wrote: > >> On Thu, Jan 26, 2017 at 3:42 PM, Eugene Kirpichov >> <kirpic...@google.com.inva

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Robert Bradshaw
On Thu, Jan 26, 2017 at 4:20 PM, Ben Chambers wrote: > Here's an example API that would make this part of a DoFn. The idea here is > that it would still be run as `ParDo.of(new MyBatchedDoFn())`, but the > runner (and DoFnRunner) could see that it has asked for

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Robert Bradshaw
= c.element(); >> ... >> } >> } >> >> Possible API making this part of DoFn (with dynamic size): >> >> public MyBatchedDoFn extends DoFn<I, O> { >> @ProcessBatch >> public boolean processBatch(ProcessContext c) { >>

Re: [ANNOUNCEMENT] New committers, January 2017 edition!

2017-01-26 Thread Robert Bradshaw
Welcome and congratulations! On Thu, Jan 26, 2017 at 5:05 PM, Sourabh Bajaj wrote: > Congrats!! > > On Thu, Jan 26, 2017 at 5:02 PM Jason Kuster > wrote: > >> Congrats all! Very exciting. :) >> >> On Thu, Jan 26, 2017 at 4:48 PM,

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Robert Bradshaw
On Thu, Jan 26, 2017 at 3:42 PM, Eugene Kirpichov wrote: > I agree that wrapping the DoFn is probably not the way to go, because the > DoFn may be quite tricky due to all the reflective features: e.g. how do > you automatically "batch" a DoFn that uses state and

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Robert Bradshaw
k the "make batches of at most N but don't wait too long if you don't get to N" is a very useful first (and tractable) start that can be built on. > On Thu, Jan 26, 2017 at 3:01 PM Robert Bradshaw <rober...@google.com.invalid> > wrote: > >> On Thu, Jan 26, 2017 at 12:48 PM

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-01-26 Thread Robert Bradshaw
On Thu, Jan 26, 2017 at 12:48 PM, Eugene Kirpichov wrote: > I don't think we should make batching a core feature of the Beam > programming model (by adding it to DoFn as this code snippet implies). I'm > reasonably sure there are less invasive ways of implementing

Re: Better developer instructions for using Maven?

2017-02-22 Thread Robert Bradshaw
On Wed, Feb 22, 2017 at 7:51 AM, Jean-Baptiste Onofré wrote: > Thanks Kenn, it's perfectly clear now ;) > That was Kenn's vote. I'm of the opposite opinion (at least I think checkstyle should be done by default, possibly others). It's clear many people aren't very happy with

Re: Better developer instructions for using Maven?

2017-02-10 Thread Robert Bradshaw
IMO) decide to go > > do something else. > > > > Folks other than newcomers can learn a repertoire of commands, like > Robert > > says. So we shouldn't consider them (aka "us") so much when deciding > > whether "fast" or "slow" is the default,

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-02-09 Thread Robert Bradshaw
On Wed, Feb 8, 2017 at 10:48 AM, Kenneth Knowles wrote: > Hi Etienne, > > If the timer is firing n times for n elements, that's a bug in the runner / > shared runner code. It should be deduped. Which runner? Can you file a JIRA > against me to investigate? I'm still in

Re: [DISCUSS] Python SDK status and next steps

2017-01-20 Thread Robert Bradshaw
a single command >> > > using Maven today. >> > > - Publishing the artifacts to a central repository such as PyPI. >> > > >> > >> > I'm more than happy to help on this. We left on purpose some things open >> > when we added Maven suppor

Re: Beam File System in the Python SDK

2017-03-01 Thread Robert Bradshaw
Much needed! Added a couple of comments. On Wed, Mar 1, 2017 at 3:08 PM, Sourabh Bajaj < sourabhba...@google.com.invalid> wrote: > Hi, > > BEAM-1441 is a ticket > for > implementing the Beam File System in the Python SDK similar to the one >

Re: Better developer instructions for using Maven?

2017-01-09 Thread Robert Bradshaw
On Mon, Jan 9, 2017 at 3:49 AM, Aljoscha Krettek wrote: > I also usually prefer "mvn verify" to to the expected thing but I see that > quick iteration times are key. I see https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html verify - run any

Re: Creating Sum.[*]Fn instances

2016-12-22 Thread Robert Bradshaw
I was about to comment the same. Generally the CombineFns are more composable units than the global and per-key wrappings; it's not clear why we favor the latter for some Combiners. On Thu, Dec 22, 2016 at 9:59 AM, Ben Chambers wrote: > Don't they need to be visible for use

Re: [jira] [Commented] (BEAM-1261) State API should allow state to be managed in different windows

2017-03-23 Thread Robert Bradshaw
I like the idea of being able to use WindowMappingFns to access state across windows in a manner similar to how side inputs are accessed. On Wed, Mar 22, 2017 at 9:56 PM, Kenneth Knowles (JIRA) wrote: > > [ https://issues.apache.org/jira/browse/BEAM-1261?page= >

Re: [PROPOSAL] @OnWindowExpiration

2017-03-28 Thread Robert Bradshaw
Another alternative is to be able to set special timers, e.g. end of window and expiration of window. That at least addresses (2). On Tue, Mar 28, 2017 at 1:27 PM, Kenneth Knowles wrote: > Hi all, > > I have a little extension to the stateful DoFn annotations to

Re: [DISCUSS] Change "RunnableOnService" To A More Intuitive Name

2017-03-27 Thread Robert Bradshaw
d the rename from RunnableOnService to ValidatesRunner in > > the > > > > Java codebase (Python was already there) > > > > https://github.com/apache/beam/pull/2157. > > > > > > > > I'm sure there will be stragglers throughout our docs, etc, so ple

Re: Splittable DoFn for Python SDK

2017-03-14 Thread Robert Bradshaw
+1, I think this is a natural extension of the SDF to Python. On Tue, Mar 14, 2017 at 1:19 PM, Chamikara Jayalath wrote: > Thanks Eugene. Will keep you cc'd. > > - Cham > > On Tue, Mar 14, 2017 at 1:15 PM Eugene Kirpichov > wrote: > >> Thanks Cham!

Re: Style: how much testing for transform builder classes?

2017-03-21 Thread Robert Bradshaw
re comprehensive than manual tests...)? AutoValue like alleviates many, but not all, of these concerns - as Ismael > points out. > If two features are not orthogonal, that perhaps merits more test (and documentation). > > > > On Tue, Mar 21, 2017 at 1:18 PM, Robert Bradshaw

Re: Style: how much testing for transform builder classes?

2017-03-21 Thread Robert Bradshaw
On Wed, Mar 15, 2017 at 2:11 AM, Ismaël Mejía wrote: > +1 to Vikas point maybe the right place to enforce things correct > build tests is in the validate and like this reduce the test > boilerplate and only test the validate, but I wonder if this totally > covers both cases

Re: Python build artifacts seem to be misconfigured

2017-04-11 Thread Robert Bradshaw
We should also ignore them: https://github.com/apache/beam/pull/2494 On Thu, Apr 6, 2017 at 6:45 PM, Kenneth Knowles wrote: > Thanks for the pointer. I'll dig in to tox docs to see why this isn't > happening. Probably something to do with unclean shutdowns. > > On Thu,

Re: Renaming SideOutput

2017-04-11 Thread Robert Bradshaw
+1, I think this is a lot clearer. On Tue, Apr 11, 2017 at 2:24 PM, Stephen Sisk wrote: > strong +1 for changing the name away from sideOutput - the fact that > sideInput and sideOutput are not really related was definitely a source of > confusion for me when learning

Re: Renaming SideOutput

2017-04-11 Thread Robert Bradshaw
t;k...@google.com.invalid> wrote: > +1 ditto about sideInput and sideOutput not actually being related > > On Tue, Apr 11, 2017 at 3:52 PM, Robert Bradshaw < > rober...@google.com.invalid> wrote: > >> +1, I think this is a lot clearer. >> >> On

Re: Question regarding loops in BEAM programs

2017-04-13 Thread Robert Bradshaw
There is no way (short of inspecting stack traces and bytecodes) for Beam to distinguish between for (int i=0; i<3; i++) { pc = pc.apply(MyTransform()); } from pc.apply(MyTransform()).apply(MyTransform()).apply(MyTransform()); However, PTransforms are hierarchal, so you

Re: Naming of Combine.Globally

2017-04-18 Thread Robert Bradshaw
On Tue, Apr 18, 2017 at 3:03 AM, Wesley Tanaka wrote: > I believe that foldl in Haskell https://www.haskell.org/hoogle/?hoogle=foldl > admits a separate accumulator type from the type of the data structure being > "folded" > And, well, python lets you have your way

Re: [PROPOSAL] Remove KeyedCombineFn

2017-04-21 Thread Robert Bradshaw
Strongly in favor of removing this. If it's actually needed one can incorporate the key into the value for inspection in the various phases of the CombineFn, so it's no loss of expressiveness. It's perfectly reasonable to make this (rare) usecase more complicated to greatly simplify the common

Re: Let's make Beam transforms comply with PTransform Style Guide

2017-03-03 Thread Robert Bradshaw
Here's a crazy idea: what if we had a virtual fixit/hackathon to knock these out (similar to the virtual meet-up, but with an agenda)? I find communal hacking sessions towards a common goal are a good way to get to know each other and get a lot done. Would there be any interest in this? On Wed,

Re: [VOTE] Release 0.6.0, release candidate #2

2017-03-11 Thread Robert Bradshaw
On Fri, Mar 10, 2017 at 9:05 PM, Ahmet Altay wrote: > Hi everyone, > > Please review and vote on the release candidate #2 for the version 0.6.0, > as follows: > [ ] +1, Approve the release > [ ] -1, Do not approve the release (please provide specific comments) > > > The

Re: Style: how much testing for transform builder classes?

2017-03-11 Thread Robert Bradshaw
+1 to reducing "trivial" tests such as these. More below. On Fri, Mar 10, 2017 at 7:53 PM, Kenneth Knowles wrote: > +0.5 > > Tests of trivial validation failures, if they check the error message, are > actually tests of effective communication in the error message

Re: [VOTE] Release 0.6.0, release candidate #2

2017-03-13 Thread Robert Bradshaw
On Sat, Mar 11, 2017 at 11:19 PM, Ahmet Altay <al...@google.com.invalid> wrote: > On Sat, Mar 11, 2017 at 11:48 AM, Robert Bradshaw < > rober...@google.com.invalid> wrote: > > > On Fri, Mar 10, 2017 at 9:05 PM, Ahmet Altay <al...@google.com.invalid>

Re: [VOTE] Release 0.6.0, release candidate #2

2017-03-13 Thread Robert Bradshaw
+1 (binding) On Mon, Mar 13, 2017 at 11:10 AM, Robert Bradshaw <rober...@google.com> wrote: > On Sat, Mar 11, 2017 at 11:19 PM, Ahmet Altay <al...@google.com.invalid> > wrote: > >> On Sat, Mar 11, 2017 at 11:48 AM, Robert Bradshaw < >> rober...@google.com.in

Re: Proposed API for a Whole File IO

2017-08-01 Thread Robert Bradshaw
On Tue, Aug 1, 2017 at 1:42 PM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: > Hi, > As mentioned on the PR - I support the creation of such an IO (both read > and write) with the caveats that Reuven mentioned; we can refine the naming > during code review. > Note that you won't be

Re: [DISCUSS] Beam pipeline logical and physical DAGs visualization.

2017-08-03 Thread Robert Bradshaw
Nice. In terms of shared data structures, we have https://github.com/apache/beam/blob/master/sdks/common/runner-api/src/main/proto/beam_runner_api.proto . Presumably a utility that converts this to a dot file would be quite useful. It might be interesting to experiment with different ways of

Re: Adding back PipelineRunner#apply method

2017-08-15 Thread Robert Bradshaw
On Tue, Aug 15, 2017 at 10:21 AM, Shen Li wrote: > Hi Thomas, > > Does it mean future Pipeline implementations would allow applications to > set the runner after a pipeline has been constructed? Correct, that's the intent. > > Thanks, > Shen > > On Tue, Aug 15, 2017 at

Re: [PROPOSAL] "Requires deterministic input"

2017-08-15 Thread Robert Bradshaw
On Tue, Aug 15, 2017 at 2:14 PM, Reuven Lax <re...@google.com.invalid> wrote: > On Tue, Aug 15, 2017 at 1:59 PM, Robert Bradshaw < > rober...@google.com.invalid> wrote: > >> On Sat, Aug 12, 2017 at 1:13 AM, Reuven Lax <re...@google.com.invalid> >> wrote

Re: [PROPOSAL] "Requires deterministic input"

2017-08-15 Thread Robert Bradshaw
On Sat, Aug 12, 2017 at 1:13 AM, Reuven Lax <re...@google.com.invalid> wrote: > On Fri, Aug 11, 2017 at 10:52 PM, Robert Bradshaw < >> The question here is whether the ordering is part of the "content" of >> an iterable. > > My initial instinct was to sa

Re: [PROPOSAL] "Requires deterministic input"

2017-08-11 Thread Robert Bradshaw
On Thu, Aug 10, 2017 at 1:53 PM, Reuven Lax wrote: > On Thu, Aug 10, 2017 at 1:07 PM, Kenneth Knowles > wrote: > >> > > >- Does it also imply fixed length and content for value >> iterators? >> > > > >> >> The concept of "value iterator"

Re: [ANNOUNCEMENT] New PMC members, August 2017 edition!

2017-08-11 Thread Robert Bradshaw
Congratulations! On Fri, Aug 11, 2017 at 2:23 PM, Jean-Baptiste Onofré wrote: > Congrats ! > > Regards > JB > > > On 08/11/2017 07:40 PM, Davor Bonaci wrote: >> >> Please join me and the rest of Beam PMC in welcoming the following >> committers as our newest PMC members. They

Re: Style of messages for checkArgument/checkNotNull in IOs

2017-08-11 Thread Robert Bradshaw
Huge +1 to the checkArgument(username != null, ...) style. A note on validate(), aren't we trying to remove pipeline options from PTransforms altogether (and, in addition, how does this even work with the Runner API and cross-language transforms). On Thu, Aug 10, 2017 at 4:59 PM, Eugene

Re: Requiring PTransform to set a coder on its resulting collections

2017-08-11 Thread Robert Bradshaw
> kirpic...@google.com.invalid> wrote: > >> I've updated the guidance in PTransform Style Guide on setting coders >> https://beam.apache.org/contribute/ptransform-style-guide/#coders >> according to this discussion. >> https://github.com/apache/beam-site/pull/

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-07-10 Thread Robert Bradshaw
Sorry, just saw https://github.com/apache/beam/pull/2211 On Mon, Jul 10, 2017 at 5:37 PM, Robert Bradshaw <rober...@google.com> wrote: > Any progress on this? > > On Thu, Mar 9, 2017 at 1:43 AM, Etienne Chauchot <echauc...@gmail.com> wrote: >> Hi all, >> >>

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-07-10 Thread Robert Bradshaw
Any progress on this? On Thu, Mar 9, 2017 at 1:43 AM, Etienne Chauchot wrote: > Hi all, > > We had a discussion with Kenn yesterday about point 1 bellow, I would like > to note it here on the ML: > > Using new method timer.set() instead of timer.setForNowPlus() makes the >

Re: [jira] [Commented] (BEAM-2573) Better filesystem discovery mechanism in Python SDK

2017-07-07 Thread Robert Bradshaw
Wouldn't you want to import filesystems to do validation of paths, etc? (Also, registering an imported object is less error-prone than registering a string.) On Fri, Jul 7, 2017 at 9:29 PM, Chamikara Jayalath (JIRA) wrote: > > [ >

Re: MergeBot is here!

2017-07-15 Thread Robert Bradshaw
kets for some of this >> stuff at some point, but am more likely to track via github issues on the >> mergebot repository for now). Comments welcome. :) >> >> https://docs.google.com/document/d/13D1nUgTeonyvNtRi4bJM- >> Vyj9YOCVHZT7QA6EOauKT4/edit >> >> On Wed, Jul

Re: Should Pipeline wait till all processing time timers fire before exit?

2017-07-25 Thread Robert Bradshaw
I generally agree, but it's unclear what to do with timers that are scheduled during the execution of existing timers. (For example, a "heartbeat" source may process a timer by emitting an element and scheduling a timer for the future. One would never be able to fire "all" timers. I suppose this

Re: Custom window merging

2017-07-27 Thread Robert Bradshaw
t the > standardization deserves attention and documentation. > These are already called out individually in the compatibility matrix, which probably makes sense as it allows a runner to declare "partial" support for windowing. On Wed, Jul 26, 2017 at 9:34 PM, Robert Bradshaw <

Re: Requiring PTransform to set a coder on its resulting collections

2017-07-27 Thread Robert Bradshaw
On Thu, Jul 27, 2017 at 10:04 AM, Kenneth Knowles wrote: > On Thu, Jul 27, 2017 at 2:22 AM, Lukasz Cwik > wrote: >> >> Ken/Robert, I believe users will want the ability to set the output coder >> because coders may have intrinsic properties

Re: Requiring PTransform to set a coder on its resulting collections

2017-07-26 Thread Robert Bradshaw
+1, I'm a huge fan of moving this direction. Right now there's also the ugliness that setCoder() may be called any number of times before a PCollection is used (the last setter winning) but is an error to call it once it has been used (and here "used" is not clear--if a PCollection is returned

Re: Should Pipeline wait till all processing time timers fire before exit?

2017-07-26 Thread Robert Bradshaw
e about drain. > On Tue, Jul 25, 2017 at 5:34 PM, Eugene Kirpichov < > kirpic...@google.com.invalid> wrote: > >> Yes, and I think in this case the pipeline should never transition to DONE. >> >> On Tue, Jul 25, 2017 at 3:42 PM Robert Bradshaw >> <rober...

Re: Custom window merging

2017-07-26 Thread Robert Bradshaw
I think there may be a distinction between hard-coding support for the "standard" WindowFns (e.g. https://github.com/apache/beam/blob/master/sdks/common/runner-api/src/main/proto/standard_window_fns.proto) and accepting WindowFns as a UDF. Different runners have offered different levels of support

Re: MergeBot is here!

2017-07-12 Thread Robert Bradshaw
On Tue, Jul 11, 2017 at 7:14 PM, Kenneth Knowles <k...@google.com.invalid> wrote: > > On Tue, Jul 11, 2017 at 4:25 PM, Robert Bradshaw < > rober...@google.com.invalid> wrote: > > > On Tue, Jul 11, 2017 at 8:51 AM, Kenneth Knowles <k...@google.com.invalid&

Re: Passing pipeline options into PTransforms and Filesystems in Python

2017-07-11 Thread Robert Bradshaw
Templates, including ValueProviders, were recently added to the Python SDK. +1 to pursuing this train of thought (and as I mentioned on the bug, and has been mentioned here, we don't want to add PipelineOptions access to PTransforms/at construction time). On Tue, Jul 11, 2017 at 3:21 PM, Kenneth

Re: [PROPOSAL] External Join with KV Stores

2017-07-05 Thread Robert Bradshaw
I'm generally in favor of viewing these as seekable reads rather than an entirely new concept. Not sure how it would fit into the SDFs architecture. On Wed, Jul 5, 2017 at 10:27 AM, Lukasz Cwik wrote: > Yes, I was thinking the same thing about side inputs. Our current

Re: Status of our CI tools

2017-04-28 Thread Robert Bradshaw
On Fri, Apr 28, 2017 at 9:56 PM, Jean-Baptiste Onofré wrote: > +1 > > Travis is useless and our Jenkins is good IMHO ! Travis is really useful for the Python SDK, but I'm hopeful that soon Jenkins will be stable and quick enough that I won't miss it, and having only one CI to

Re: [jira] [Created] (BEAM-2729) Post commit fail: Input to GroupByKey must be of Tuple or Any type

2017-08-04 Thread Robert Bradshaw
Key: BEAM-2729 > URL: https://issues.apache.org/jira/browse/BEAM-2729 > Project: Beam > Issue Type: Bug > Components: sdk-py > Reporter: Ahmet Altay > Assignee: Robert Bradshaw > > > r

Re: [VOTE] Release 2.1.0, release candidate #3

2017-08-16 Thread Robert Bradshaw
+1 binding (I've been on vacation as well.) On Wed, Aug 16, 2017 at 8:50 AM, Lukasz Cwik wrote: > Back from vacation. > > +1 binding > > BEAM-2671 has been marked for 2.2.0 release. > > > > On Wed, Aug 16, 2017 at 2:08 AM, Kobi Salant wrote: >

Re: [Proposal] Progress Reporting in Fn API

2017-08-22 Thread Robert Bradshaw
I put together https://docs.google.com/document/d/1Dx18qBTvFWNqwLeecemOpKfleKzFyeV3Qwh71SHATvY/edit?usp=sharing which explains a bit how I think about progress and might be helpful. On Mon, Aug 21, 2017 at 10:22 AM, Vikas RK wrote: > Hi, > > I have updated the proposal >

Re: Issue with Coder documentation regarding context

2017-05-03 Thread Robert Bradshaw
I filed a https://issues.apache.org/jira/browse/BEAM-2166 simply removing these from the public API (for the reasons listed). We can always bring them back in a forward compatible way if it turns out that they're actually needed. On Thu, Feb 9, 2017 at 1:18 PM, Jean-Baptiste Onofré

Re: TextIO and .withWindowedWrites() - filenamepolicy

2017-05-11 Thread Robert Bradshaw
I like the idea of WWW and PPP, assuming there is a standard enough stringification of windows and panes. However, we may want to elide adjacent tokes if the window is global or the pane is the only possible (or first?) one to avoid writing things like --of-0005---. On Thu, May 11, 2017 at

Re: Behavior of Top.Largest

2017-05-15 Thread Robert Bradshaw
On Sun, May 14, 2017 at 3:36 PM, Ben Chambers wrote: > Exposing the CombineFn is necessary for use with composed combine or > combining value state. There may be other reasons we made this visible, but > these continue to justify it. > These are the CompareFns, not

Re: [VOTE] First stable release: release candidate #4

2017-05-14 Thread Robert Bradshaw
+1 Verified all the checksums and signatures. (Now that both md5 and sha1 are broken, we should probably provide sha-256 as well.) Spot checked the site and documentation, left comments on the PR. The main landing page has nothing about the Beam stable release, and the top blog entry (right in

Re: Bundling multiple TestPipeline tests into one pipeline

2017-06-23 Thread Robert Bradshaw
+1 http://mail-archives.apache.org/mod_mbox/incubator-beam-dev/201610.mbox/%3CCAFFRZHX4yq%3D%3DxuvkPjwDFezVhWH82oj%2BgpS-OhUMc%3D3QUVaS1g%40mail.gmail.com%3E On Fri, Jun 23, 2017 at 9:23 AM, Davor Bonaci wrote: > This would be a great contribution if anyone wants to give it a

Re: Behavior of Top.Largest

2017-05-22 Thread Robert Bradshaw
gt; > that is not possible, an easier alternative would be to file a JIRA issue >> > so that the work could be tracked in the other SDK. >> > >> > Ahmet >> > >> > On Fri, May 19, 2017 at 4:22 PM, Robert Bradshaw < >> > rober...@google.com.inva

Re: low availability in the coming 4 weeks

2017-05-25 Thread Robert Bradshaw
Congratulations! On Wed, May 24, 2017 at 8:50 PM, James wrote: > Congratulations Mingmin! Take your time with your new baby/ > > Mingmin Xu 于2017年5月25日周四 上午11:33写道: > >> Hello everyone, >> >> I'll take 4 weeks off to take care of my new born baby. I'm

Re: How can I disable running Python SDK tests when testing my Java change?

2017-05-18 Thread Robert Bradshaw
We could consider splitting Python up into the four things it runs: all tests with Cython, all tests without Cython, docs, and checkstyle. However, I never use Maven when developing the python portions. On Thu, May 18, 2017 at 6:35 PM, Thomas Groh wrote: > Generally I

Re: Proper developer instructions for Python SDK

2017-05-22 Thread Robert Bradshaw
On Mon, May 22, 2017 at 11:13 AM, Chamikara Jayalath <chamik...@apache.org> wrote: > (moving to a separate thread) > > On Mon, May 22, 2017 at 10:45 AM Robert Bradshaw > <rober...@google.com.invalid> wrote: > >> On Sun, May 21, 2017 at 11:45 AM, Dan Halperin >&

Re: Beam Fn API

2017-05-31 Thread Robert Bradshaw
; go >>>>>>>> >>>>>>>>> with a 1-1 relationship or break it up more finely grained and >>>>>>>>> >>>>>>>> dedicate >>>>>> >>>>>>> some machines to have sp

Re: [jira] [Commented] (BEAM-101) Data-driven triggers

2017-06-01 Thread Robert Bradshaw
>> URL: https://issues.apache.org/jira/browse/BEAM-101 >> Project: Beam >> Issue Type: New Feature >> Components: beam-model >>Reporter: Robert Bradshaw >> >> For some applications, it's useful to declare

Re: [jira] [Created] (BEAM-2426) Remove imports from runner/init

2017-06-08 Thread Robert Bradshaw
Yeah, wish we had gotten to this pre-alpha. On Thu, Jun 8, 2017 at 3:10 PM, Ahmet Altay (JIRA) wrote: > Ahmet Altay created BEAM-2426: > - > > Summary: Remove imports from runner/init > Key: BEAM-2426 >

On emitting from finshBundle

2017-05-05 Thread Robert Bradshaw
The JIRA issue https://issues.apache.org/jira/browse/BEAM-1283 suggests requiring an explicit Window when emitting from finshBundle. I'm starting a thread because JIRA/GitHub probably isn't the best (or most efficient) place to have this discussion. The original Spec requires the ambient WindowFn

Re: On emitting from finshBundle

2017-05-05 Thread Robert Bradshaw
t; timestamp it wants to use, and output the correct thing to the correct > timestamp and window. I believe that having only the ability to > outputWindowed(value, timestamp, window) makes it quite obvious that this > is necessary. It is not boilerplate to do so, but core functionality. Yes, and

Re: First stable release: version designation?

2017-05-08 Thread Robert Bradshaw
I also have a definite (I guess that's closer to strong that slight) preference for 2.0. With version numbers, a gap is less likely to cause trouble than the ambiguity of an overlap, and easy to document (vs. with ambiguity, one wouldn't even think to consult the documentation without knowing the

Re: On emitting from finshBundle

2017-05-08 Thread Robert Bradshaw
to global combine), as long as we can detect its abuse. I'd be interested in hearing if others have thoughts on this as well. - Robert On Fri, May 5, 2017 at 2:05 PM, Kenneth Knowles <k...@google.com.invalid> wrote: > On Fri, May 5, 2017 at 1:53 PM, Robert Bradshaw <rober...@google

Re: Process for getting the first stable release out

2017-05-08 Thread Robert Bradshaw
hem, touched by significantly more people.) > With post commits running automatically on master only, that seems like a > logical starting point. But, it doesn't matter really -- either way works. > > On Mon, May 8, 2017 at 12:30 PM, Robert Bradshaw < > rober...@google.com.invalid> wr

Re: Process for getting the first stable release out

2017-05-08 Thread Robert Bradshaw
An alternative strategy, given the number of outstanding changes, would be to create release-intended PRs against the release branch itself, then periodically merge back to master. This would reduce the need for manual (and error-prone) cherry-picking. On Fri, May 5, 2017 at 5:21 PM, Davor Bonaci

Re: Process for getting the first stable release out

2017-05-08 Thread Robert Bradshaw
t one or the other before it's merged. There certainly may be cases where we decide to merge into master to be safe and optionally CP after the fact, but for many PRs it's clear where they should end up. > On Mon, May 8, 2017 at 1:10 PM, Robert Bradshaw <rober...@google.com.invalid >

Re: On emitting from finshBundle

2017-05-05 Thread Robert Bradshaw
On Fri, May 5, 2017 at 1:33 PM, Kenneth Knowles <k...@google.com.invalid> wrote: > On Fri, May 5, 2017 at 12:43 PM, Robert Bradshaw < > rober...@google.com.invalid> wrote: > >> On Fri, May 5, 2017 at 12:14 PM, Kenneth Knowles <k...@google.com.invalid> >> wr

Re: Congratulations Davor!

2017-05-04 Thread Robert Bradshaw
Congratulations, Davor! Well deserved. On Thu, May 4, 2017 at 9:53 AM, Hadar Hod wrote: > Congrats, Davor! > > On Thu, May 4, 2017 at 8:56 AM, Chamikara Jayalath > wrote: > >> Congrats Davor. Very well deserved. >> >> - Cham >> >> On Thu, May 4,

Re: Behavior of Top.Largest

2017-05-19 Thread Robert Bradshaw
I see this was implemented. Do we have a policy/guideline for when a name is "bad enough" to merit renaming (and keeping a duplicate, deprecated member around for a year or more). On Mon, May 15, 2017 at 9:25 AM, Robert Bradshaw <rober...@google.com> wrote: > On Sun, May 14, 2

Re: Proposal: Unbreak Beam Python 2.1.0 with 2.1.1 bugfix release

2017-09-19 Thread Robert Bradshaw
s need to happen to fix it? >> >> On Tue, Sep 19, 2017, 5:49 PM Chamikara Jayalath <chamik...@apache.org> >> wrote: >> >> > +1 for cutting 2.1.1 for Python SDK only. >> > >> > Thanks, >> > Cham >> > >> > O

Re: Proposal: Unbreak Beam Python 2.1.0 with 2.1.1 bugfix release

2017-09-19 Thread Robert Bradshaw
+1. Right now anyone who follows our quickstart instructions or otherwise installs the latest release of apache_beam is broken. On Tue, Sep 19, 2017 at 2:05 PM, Charles Chen wrote: > The latest version (2.1.0) of Beam Python ( > https://pypi.python.org/pypi/apache-beam)

Re: Simplifying beam pipeline construction

2017-09-18 Thread Robert Bradshaw
use the proposed design pattern. On the back end, we could > adjust the portability framework protos. > > On Mon, Sep 18, 2017 at 5:49 PM, Robert Bradshaw < > rober...@google.com.invalid> wrote: > >> In the effort to simplify and clean up the Beam API, especially with

Re: [VOTE RESULT] Release 2.1.1, release candidate #1

2017-09-22 Thread Robert Bradshaw
Correction, Chamikara Jayalath is a committer, not a member of the PMC. This does not change the results; the voting still stands unanimous at 4 PMC votes + a significant committer vote in the affirmative. On Fri, Sep 22, 2017 at 2:16 PM, Robert Bradshaw <rober...@google.com> wrote: >

[VOTE RESULT] Release 2.1.1, release candidate #1

2017-09-22 Thread Robert Bradshaw
I'm happy to announce that we have unanimously approved this bugfix release. There are 5 approving PMC member votes: - Chamikara Jayalath - Kenneth Knowles - Daniel Halperin - Jean-Baptiste Onofré - Aljoscha Krettek There are no disapproving votes. Thanks everyone! I will be

Re: TikaIO concerns

2017-09-20 Thread Robert Bradshaw
On Wed, Sep 20, 2017 at 2:17 PM, Sergey Beryozkin wrote: > Hi, > > thanks for the explanations, > > On 20/09/17 16:41, Eugene Kirpichov wrote: >> >> Hi! >> >> TextIO returns an unordered soup of lines contained in all files you ask >> it >> to read. People usually use TextIO

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Robert Bradshaw
+1 to reducing the amount of boilerplate for dealing with side inputs. I prefer the "NewDoFn" style of side inputs for consistency. The primary drawback seems to be lambda's incompatibility with annotations. This is solved in Python by letting all the first annotated argument of the process

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Robert Bradshaw
mplicit" api. If we do go this direction for side inputs, we should also consider it for state and side outputs. > On Wed, Sep 13, 2017 at 1:03 PM Robert Bradshaw <rober...@google.com.invalid> > wrote: > >> +1 to reducing the amount of boilerplate for dealing with side

Re: Using side inputs in any user code via thread-local side input accessor

2017-09-13 Thread Robert Bradshaw
On Wed, Sep 13, 2017 at 1:56 PM, Eugene Kirpichov <kirpic...@google.com.invalid> wrote: > On Wed, Sep 13, 2017 at 1:44 PM Robert Bradshaw <rober...@google.com.invalid> > wrote: > >> On Wed, Sep 13, 2017 at 1:17 PM, Eugene Kirpichov >> <kirpic...@googl

Simplifying beam pipeline construction

2017-09-18 Thread Robert Bradshaw
In the effort to simplify and clean up the Beam API, especially with an eye towards making Beam more friendly towards interactive use, I propose getting rid of the Pipline object. See the full proposal at https://s.apache.org/no-beam-pipeline . I'd like to hear people's thoughts on the idea. -

Re: Proposal: Unbreak Beam Python 2.1.0 with 2.1.1 bugfix release

2017-09-21 Thread Robert Bradshaw
On Wed, Sep 20, 2017 at 8:42 PM, Kenneth Knowles wrote: > +1 to a point release to unbreak users. I'm agnostic as to whether to do so > by pinning one thing or unpinning another. Different philosophies there. The safe immediate change is to restrict to the formerly

[VOTE] Release 2.1.1, release candidate #1

2017-09-21 Thread Robert Bradshaw
Hi everyone, As discussed earlier in this list [1] we'd like to get a bugfix release out for beam 2.1. Please review and vote on the release candidate #1 for the version 2.1.1, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments)

  1   2   3   4   5   >