GCS Issues running java tests

2018-03-02 Thread Robert Bradshaw
When trying to run the Java tests, I keep getting Expected: (an instance of java.lang.IllegalArgumentException and exception with message a string containing "Error constructing default value for gcpTempLocation: tempLocation is not a valid GCS path" and exception with cause exception with

Re: Releases and user support

2018-03-02 Thread Robert Bradshaw
On Fri, Mar 2, 2018 at 8:45 AM Romain Manni-Bucau wrote: > Hi guys, > I didn't find a page about beam release support. With the fast minor release rrythm which is targetted by beam (see other threads on that), I wonder what - as an end user - you should expect as breakage

Re: Merging Python code? Help avoid Python 3 regressions with these two simple steps :)

2018-03-02 Thread Robert Bradshaw
To address the first point, 3.4 is almost certainly sufficient for our needs (running lint_py3 to prevent regressions). Also, +1 that automating this is going to be much more effective than asking users to manually do extra steps. Long-term, we should definitely support 3.5+, definitely not

Re: Beam 2.4.0

2018-03-02 Thread Robert Bradshaw
v cycle will be smaller). > > > > If we do this for the next cycle we will have a 6 week ‘dev’ period and > then we > > will have optimistically an average of 2 weeks of ‘releasing’ and 6 > weeks ‘dev’ > > cycles. > > > > On Thu, Mar 1, 2018 at 6:46 AM, Jean-

Re: Support non-keyed stateful ParDo

2018-04-25 Thread Robert Bradshaw
On Wed, Apr 25, 2018 at 5:45 PM Xinyu Liu wrote: > Hi, > I am working on adding the stateful ParDo to the upcoming BEAM Samza runner, and realized that the state for each ParDo processElement() is not only associated with the window of the element, but also the key of the

Re: [DISCUSS] Committer Guidelines / Hygene before merging PRs

2018-09-27 Thread Robert Bradshaw
I agree that we should create a good pointer for cleaning up PRs, and request (though not require) that authors do it. It's unfortunate though that squashing during a review makes things difficult to follow, so adds one more round trip. We could consider for those PRs that make sense as a single

Re: Removing documentation for old Beam versions

2018-09-27 Thread Robert Bradshaw
he CI process, then I'm in favor of that. It looks cleaner to >>> not mingle source and generated files in the same repo. Otherwise we can do >>> the asf-site branch in the main repo and get rid of docs from it once we >>> found a better solution. >>> >>> >&

Re: Beam website sources migrated to apache/beam

2018-10-08 Thread Robert Bradshaw
; >> We hope the new contribution experience will be seamless and make your > >> website contributions the best part of your day. If you find any rough > >> edges or areas for improvement, please add them to the fit-and-finish > >> list here: https://issues.apache.

Re: Splitting the repo

2018-10-12 Thread Robert Bradshaw
CI but the dev are never affected by >>> that and the build does not mess up their machines as well. >>> >>> Today the main blocker is that default "profile" (script) is not matching >>> dev persona and therefore there is no real hope to have external

Re: post-commit failure emails

2018-10-12 Thread Robert Bradshaw
I agree the jenkins emails are spammy (to the point that I honestly can't follow all of them). +1 to emailing "suspects" as defined by those that impacted the build in the time it turned green to red. On Fri, Oct 12, 2018 at 12:55 AM Udi Meiri wrote: > > The email trigger is setup to trigger on

Re: Python SDK: .options deprecation

2018-10-12 Thread Robert Bradshaw
Correct. Among other things, we don't want to expose the choice of runner during pipeline construction (perhaps it's even deferred), or characteristics like streaming vs. batch (the runner should be able to make this choice on its own). This was not yet pushed all the way through in Python as it

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-15 Thread Robert Bradshaw
feature that SDKs may want to support, but I wouldn't want to require this complexity for bootstrapping an SDK. Regarding always keeping runner options separate, +1, though I'm not sure the line is always clear. > On Mon, Oct 15, 2018 at 8:04 AM Robert Bradshaw wrote: >> >> On Mon

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-15 Thread Robert Bradshaw
On Mon, Oct 15, 2018 at 11:30 PM Lukasz Cwik wrote: > > On Mon, Oct 15, 2018 at 1:17 PM Robert Bradshaw wrote: >> >> On Mon, Oct 15, 2018 at 7:50 PM Lukasz Cwik wrote: >> > >> > I agree with the sentiment for better error checking. >> > >> &

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-15 Thread Robert Bradshaw
On Mon, Oct 15, 2018 at 3:58 PM Maximilian Michels wrote: > > I agree that the current approach breaks the pipeline options contract > because "unknown" options get parsed in the same way as options which > have been defined by the user. FWIW, I think we're already breaking this "contract."

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-16 Thread Robert Bradshaw
its JSON form? > > On Mon, Oct 15, 2018 at 2:41 PM Robert Bradshaw > wrote: > >> On Mon, Oct 15, 2018 at 11:30 PM Lukasz Cwik wrote: >> > >> > On Mon, Oct 15, 2018 at 1:17 PM Robert Bradshaw >> wrote: >> >> >> >>

Re: Rethinking Timers as PCollections

2018-10-16 Thread Robert Bradshaw
eems like the pun of "PCollection" for so many purposes is hitting its limit. >> >> Timers should fire according to just the watermark of the data input, but nevertheless are a hold on GC and also output watermark. >> >> Kenn >> >> On Thu, Oct 4, 20

Re: [PROPOSAL] Using Bazel and Docker for Python SDK development and tests

2018-10-18 Thread Robert Bradshaw
all the different environments. On Wed, Oct 17, 2018 at 10:17 AM Udi Meiri wrote: > >> On Wed, Oct 17, 2018 at 1:38 AM Robert Bradshaw >> wrote: >> >>> On Tue, Oct 16, 2018 at 12:48 AM Udi Meiri wrote: >>> >>>> Hi, >>>> >>>

Re: [DISCUSS] Committer Guidelines / Hygene before merging PRs

2018-10-16 Thread Robert Bradshaw
nt >>> to find out what caused this change. >>> >>> I believe we can improve our commit guidelines in this way and it should >>> help to have commit history more clean and easy to read. >>> >>> On 1 Oct 2018, at 06:34, Kenneth Knowles wrote: >

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-16 Thread Robert Bradshaw
to be nested within an option. This >>> is amplified by there being two Runners the user needs to be aware of, >>> i.e. PortableRunner and the actual Runner (Dataflow/Flink/Spark..). >>> >>> I feel like we would eventually replicate all options in the SDK b

Re: [PROPOSAL] Using Bazel and Docker for Python SDK development and tests

2018-10-18 Thread Robert Bradshaw
and often. The cost of test failures in postsubmit is *significantly* higher, we should only put stuff we can't test earlier there. (If we do move things, I would suggest we keep at least some of the gcp and py3 tests in presubmit if we can't afford to run the whole suite). > On Thu, Oct 18, 2

Re: What is required for LTS releases? (was: [PROPOSAL] Prepare Beam 2.8.0 release)

2018-10-22 Thread Robert Bradshaw
d Blog >> <http://rmannibucau.wordpress.com> | Github >> <https://github.com/rmannibucau> | LinkedIn >> <https://www.linkedin.com/in/rmannibucau> | Book >> <https://www.packtpub.com/application-development/java-ee-8-high-performance> >> >> >

Re: [PROPOSAL] Using Bazel and Docker for Python SDK development and tests

2018-10-17 Thread Robert Bradshaw
On Tue, Oct 16, 2018 at 12:48 AM Udi Meiri wrote: > Hi, > > In light of increasing Python pre-commit times due to the added Python 3 > tests, > I thought it might be time to re-evaluate the tools used for Python tests > and development, and propose an alternative. > > Currently, we use

Re: [PROPOSAL] Using Bazel and Docker for Python SDK development and tests

2018-10-18 Thread Robert Bradshaw
probably less of a burden requiring this for developers, and would simplify some of our code. However, there's probably only a small subset that merits testing with Cython and without. > On Thu, Oct 18, 2018 at 12:45 AM Robert Bradshaw > wrote: > >> We run the full suite of Python unit t

Re: [PROPOSAL] Using Bazel and Docker for Python SDK development and tests

2018-10-18 Thread Robert Bradshaw
On Wed, Oct 17, 2018 at 7:17 PM Udi Meiri wrote: > On Wed, Oct 17, 2018 at 1:38 AM Robert Bradshaw > wrote: > >> On Tue, Oct 16, 2018 at 12:48 AM Udi Meiri wrote: >> >>> Hi, >>> >>> In light of increasing Python pre-commit times due to the

Re: Java > 8 support

2018-10-18 Thread Robert Bradshaw
On Thu, Oct 18, 2018 at 4:55 AM Kenneth Knowles wrote: > Just to add to what Luke said: The reason we had those Java 8-only modules > was because some underlying tech (example: Gearpump) required Java 8. If an > engine requires something then it is OK for a user who chooses the runner > for that

Re: [PROPOSAL] Using Bazel and Docker for Python SDK development and tests

2018-10-18 Thread Robert Bradshaw
ns the same set of tests, one with --streaming and the other >>> without. This should be able to work for Python as well. >>> >>> The Worker API had some updates in the latest Gradle release but still >>> seems rough to use. >>> >>> On Wed, Oct 17, 2018 at

Re: [PROPOSAL] Move sorting to sdks-java-core

2018-10-18 Thread Robert Bradshaw
+1 to splitting out the Hadoop deps. As has been said, there's no need to move it to core for runners to optimize this. But perhaps a case could be made that this belongs in core? (On the other hand, recent discussions indicate a desire to make core even smaller.) Also, +1 to re-thinking the

Re: Why not adding all coders into ModelCoderRegistrar?

2018-10-16 Thread Robert Bradshaw
Any coders added to the ModelCoderRegistrar requires support from *all* SDKs, which is why that set is chosen sparingly. Could you clarify exactly what you're trying to achieve. It sounds like there's some case where you know the SDK will submit a KV with a Void and/or VarIntCoder in the key, and

Re: [Proposal] Euphoria DSL - looking for reviewers

2018-10-16 Thread Robert Bradshaw
Ideally one (or all) of you can become committers [1], which I think should be the goal. While for the time being this would involve the overhead of getting existing committers to sign off on PRs (which can be reviewed by others as well), this can actually be beneficial as it will be a forcing

Re: [PROPOSAL] allow the users to anticipate the support of features in the targeted runner.

2018-10-17 Thread Robert Bradshaw
On Wed, Oct 17, 2018 at 3:17 PM Kenneth Knowles wrote: > On Wed, Oct 17, 2018 at 3:12 AM Maximilian Michels wrote: > >> A dry-run feature would be useful, i.e. the user can run an inspection >> on the pipeline to see if it contains any features which are not >> supported by the Runner. >> > >

Re: Integrating Stateful DoFns from the Python SDK

2018-10-17 Thread Robert Bradshaw
Yes, we should be enforcing keyness (and use of KeyCoder with) stateful DoFns, similar to what we do for GBKs. See e.g. https://github.com/apache/beam/pull/6304#issuecomment-421935375 (This possibly relates to a long-standing issue that the coder inference should be moved up into construction, or

Re: Integrating Stateful DoFns from the Python SDK

2018-10-17 Thread Robert Bradshaw
my stateful DoFn: > > > >.with_output_types(typehints.KV[K, V]) > > > > For some reason `.with_input_types(typehints.KV[K, V])` on my stateful > > DoFn did not work. > > > > Until we enforce KV during pipeline construction, we will have to throw > >

Re: [PROPOSAL] Move sorting to sdks-java-core

2018-10-23 Thread Robert Bradshaw
I like the idea of asking for a coder for T with properties X. (E.g. the order-preserving one may not be the the most efficient, so a poor default, but required in some cases.) Note that if we go the route of secondary-key-extraction, we don't even need a full coder here, just an order-preserving

Re: Follow up ideas, to simplify creating MonitoringInfos.

2018-10-24 Thread Robert Bradshaw
Thanks for bringing this to the list; it's a good question. I think the difficulty comes from trying to statically define a lists of possibilities that should instead be runtime values. E.g. we currently we're up to about a dozen distinct types, and having a setter for each is both verbose and

Re: Java Precommit duration

2018-10-23 Thread Robert Bradshaw
On Tue, Oct 23, 2018 at 11:28 PM Kenneth Knowles wrote: > Hi all, > > Java Precommit duration is about 1h15. That is quite a burden. Especially > if something gets broken. > I'm in favor of (simple!) build breaks going in before precommits finish, on the promise that the offending test(s)

Re: [VOTE] Release 2.8.0, release candidate #1

2018-10-26 Thread Robert Bradshaw
Thanks Tim! This was my only hesitation, and sounds like we're in the clear here. +1 (binding) On Fri, Oct 26, 2018 at 5:05 PM Tim Robertson wrote: > > A colleague and I tested on 2.7.0 and 2.8.0RC1: > > 1. Quickstart on Spark/YARN/HDFS (CDH 5.12.0) (commented in spreadsheet) > 2. Our Avro to

Re: Unbalanced FileIO writes on Flink

2018-10-26 Thread Robert Bradshaw
an be overriden? However, > it is not used by the WritesFiles code. > > > -Max > > On 26.10.18 11:41, Robert Bradshaw wrote: > > I think it's worth adding a URN for the operation of distributing > > "evenly" into an "appropriate" number of shards.

Re: [PROPOSAL] Additional design for the Beam Python State and Timers API

2018-11-05 Thread Robert Bradshaw
; are required to be processed serially (generally key+window) These probably don't merit a new (pair of) named operation(s). The motivation to add them was "why can I use state in a DoFn, but not in Map/FlatMap?" which could be justified by the above. As for the side input question, de

Re: What is required for LTS releases? (was: [PROPOSAL] Prepare Beam 2.8.0 release)

2018-11-05 Thread Robert Bradshaw
? For example, are we going to cut regular patch releases > for supported branch (release-2.7.0) within the supported period that fixes > known issues ? My preference is to keep existing policy on this regard. > > Thanks, > Cham > > On Mon, Nov 5, 2018 at 5:12 AM Robert Bradsha

Re: Evolving a Coder for an added field

2018-11-05 Thread Robert Bradshaw
I think we'll want to allow upgrades across SDK versions. A runner should be able to recognize when a coder (or any other aspect of the pipeline) has changed and adapt/reject accordingly. (Until we remove coders from sources/sinks, there's also possibly the expectation that one should be able to

Re: Can't define a pytype alias from Beam's PCollection type.

2018-11-08 Thread Robert Bradshaw
On Wed, Nov 7, 2018 at 10:30 PM Zach Moshe wrote: > > (Adding the public Beam-dev group) > > On Wed, Nov 7, 2018 at 2:26 PM Zach Moshe wrote: >> >> Hi, >> I've noticed that `beam.core.pvalue.PCollection` doesn't support a >> `__getitem__()` that returns a `GenericMeta` type (like regular types

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-11-08 Thread Robert Bradshaw
There's two questions here: (A) What do we do in the short term? I think adding every runner option to every SDK is not sustainable (n*m work, assuming every SDK knows about every runner), and having a patchwork of options that were added as one-offs to SDKs is not desirable either. Furthermore,

Re: Performance of BeamFnData between Python and Java

2018-11-08 Thread Robert Bradshaw
Very cool to hear of this progress on Samza! Python protocol buffers are extraordinarily slow (lots of reflection, attributes lookups, and bit fiddling for serialization/deserialization that is certainly not Python's strong point). Each bundle processed involves multiple protos being constructed

Re: Performance of BeamFnData between Python and Java

2018-11-08 Thread Robert Bradshaw
I'd assume you're compiling the code with Cython as well? (If you're using the default containers, that should be fine.) On Fri, Nov 9, 2018 at 12:09 AM Robert Bradshaw wrote: > > Very cool to hear of this progress on Samza! > > Python protocol buffers are extraordinarily slow (lots o

Spotless and lint precommit

2018-11-13 Thread Robert Bradshaw
I really like how spottless runs separately and quickly for Java code. Should we do the same for Python lint?

Re: [PROPOSAL] ParquetIO support for Python SDK

2018-11-13 Thread Robert Bradshaw
Was there resolution on how to handle row group size, given that it's hard to pick a decent default? IIRC, the ideal was to base this on byte sizes; will this be in v1 or will there be other parameter(s) that we'll have to support going forward? On Tue, Oct 30, 2018 at 10:42 PM Heejong Lee wrote:

Re: Python profiling

2018-11-16 Thread Robert Bradshaw
le runners, either disable container cleanup >> (using --retainDockerContainers=true) or use remote distributed file >> system path. >> >> On Mon, Nov 5, 2018 at 1:05 AM Robert Bradshaw >> wrote: >> >>> Any portable runner should pick it up automatical

Re: [DISCUSS] More precision supported by DATETIME field in Schema

2018-11-07 Thread Robert Bradshaw
ownstream consequences for all runners. >>> >>> On Tue, Nov 6, 2018 at 12:38 AM Ismaël Mejía wrote: >>>> >>>> +1 to more precision even to the nano level, probably via Reuven's >>>> proposal of a different internal representation. >>>>

Re: Evolving a Coder for an added field

2018-11-06 Thread Robert Bradshaw
> > > On Mon, Nov 5, 2018 at 7:54 AM Jean-Baptiste Onofré wrote: >> >> It makes sense to have a more concrete URN including the version. >> >> Good idea Robert. >> >> Regards >> JB >> >> On 05/11/2018 16:52, Robert Bradshaw wrote: >

Re: [DISCUSS] More precision supported by DATETIME field in Schema

2018-11-06 Thread Robert Bradshaw
+1 to offering more granular timestamps in general. I think it will be odd if setting the element timestamp from a row DATETIME field is lossy, so we should seriously consider upgrading that as well. On Tue, Nov 6, 2018 at 6:42 AM Charles Chen wrote: > > One related issue that came up before is

Re: Stackoverflow Questions

2018-11-06 Thread Robert Bradshaw
People who ask on SO probably expect to be answered on SO. I don't think it makes sense to subscribe users to automated emails like this, but a weekly (or maybe even daily) summary to dev would probably be helpful in effectively getting these questions answered. On Tue, Nov 6, 2018 at 9:27 AM

Re: [VOTE] Mark 2.7.0 branch as a long term support (LTS) branch

2018-11-09 Thread Robert Bradshaw
+1 approve. On Fri, Nov 9, 2018 at 2:47 AM Ahmet Altay wrote: > > Hi all, > > Please review the following statement: > > "2.7.0 branch will be marked as the long-term-support (LTS) release branch. > This branch will be supported for a window of 6 months starting from the day > it is marked as

Re: [DISCUSS] More precision supported by DATETIME field in Schema

2018-11-09 Thread Robert Bradshaw
ifies causing problems for so many users. >> >> Reuven >> >> On Wed, Nov 7, 2018 at 4:56 PM Robert Bradshaw wrote: >>> >>> Yes, microseconds is a good compromise for covering a long enough >>> timespan that there's little reason it could be

Re: [DISCUSS] SplittableDoFn Java SDK User Facing API

2018-11-07 Thread Robert Bradshaw
I think that not returning the users specific subclass should be fine. Does the removal of markDone imply that the consumer always knows a "final" key to claim on any given restriction? On Wed, Nov 7, 2018 at 1:45 AM Lukasz Cwik wrote: > > I have started to work on how to change the user facing

Re: What is required for LTS releases? (was: [PROPOSAL] Prepare Beam 2.8.0 release)

2018-10-05 Thread Robert Bradshaw
On Fri, Oct 5, 2018 at 3:59 AM Chamikara Jayalath wrote: > > On Thu, Oct 4, 2018 at 9:39 AM Ahmet Altay wrote: > >> I agree that LTS releases require more thought. Thank you for raising >> these questions. What other open questions do we have related LTS releases? >> >> One way to do this would

Re: A new contributor

2018-10-05 Thread Robert Bradshaw
Be glad to do that. Done. On Fri, Oct 5, 2018 at 12:03 PM Gleb Kanterov wrote: > Hi all, > > My name is Gleb and I work on Data Infrastructure at Spotify. We use > Apache Beam and develop spotify/scio . > Time-to-time I create JIRA issues and submit pull

Re: [DISCUSS] Gradle for the build ?

2018-10-09 Thread Robert Bradshaw
On Tue, Oct 9, 2018 at 10:04 AM Jean-Baptiste Onofré wrote: > Hi guys, > > I know that's a hot topic, but I have to bring this discussion on the > table. > Thank you for bringing this up and revisiting it now that we have some experience. > Some months ago, we discussed about migrating our

Splitting the repo

2018-10-10 Thread Robert Bradshaw
Hi everyone, While IMHO it's too early to even be able to split the repo, it's not to early to talk about it, and I wanted to spin this off to keep the other thread focused. In particular, I am trying to figure out exactly what is hoped to be gained by splitting things up. In my experience, a

Re: Splitting the repo

2018-10-10 Thread Robert Bradshaw
interface/SPI. >> >> Our users would then be able to pick the part of the core they want, >> resulting with lighter artifacts, and for us, it gives a more flexible >> approach. >> >> Regards >> JB >> >> On 10/10/2018 10:26, Robert Bradshaw wrote

Re: [DISCUSS] Gradle for the build ?

2018-10-10 Thread Robert Bradshaw
On Wed, Oct 10, 2018 at 8:03 AM Jean-Baptiste Onofré wrote: > Hi Robert, > > about your point about we never fully build the project, even if I > agree, it's what we "sold" with Gradle. > Because, with Maven you can also build a single module without problem. > Good incremental support for the

Re: Splitting the repo

2018-10-10 Thread Robert Bradshaw
gt; I discussed with Luke and Reuven to introduce core-sql, core-schema, > > core-sdf, ... > > > > It's not a huge effort, and would allow us to move forward on Beam > "more > > API oriented" approach. > > > > R

Re: [DISCUSS] Gradle for the build ?

2018-10-10 Thread Robert Bradshaw
Some rough stats (because I was curious): The gradle files have been edited by ~79 unique contributors over 696 distinct commits, whereas the maven ones were edited (over a longer time period) by ~130 unique contributors over 1389 commits [1]. This doesn't capture how much effort was put into

Re: Splitting the repo

2018-10-10 Thread Robert Bradshaw
nted" approach. > > Regards > JB > > On 10/10/2018 10:12, Robert Bradshaw wrote: > > Hi everyone, > > > > While IMHO it's too early to even be able to split the repo, it's not to > > early to talk about it, and I wanted to spin this off to keep the o

Re: Splitting the repo

2018-10-10 Thread Robert Bradshaw
o introduce core-sql, core-schema, >> core-sdf, ... >> >> It's not a huge effort, and would allow us to move forward on Beam "more >> API oriented" approach. >> >> Regards >> JB >> >> On 10/10/2018 10:12, Robert Bradshaw wrote: >> >

Re: What is required for LTS releases? (was: [PROPOSAL] Prepare Beam 2.8.0 release)

2018-10-10 Thread Robert Bradshaw
s a branch where we will cherry-pick some important fixes in > >>> the future and where we will cut release. It's the approach I use in > >>> other Apache projects (especially Karaf) and it works fine. > >> > >> > >> JB, does Karaf has a documented p

Re: Re: How to optimize the performance of Beam on Spark(Internet mail)

2018-09-28 Thread Robert Bradshaw
Something here on the Beam side is clearly linear in the input size, as if there's a bottleneck where were' not able to get any parallelization. Is the spark variant running in parallel? On Fri, Sep 28, 2018 at 4:57 AM devinduan(段丁瑞) wrote: > Hi > I have completed my test. > 1. Spark

Re: Splitting the repo

2018-10-10 Thread Robert Bradshaw
> >> > For the clean point it is quite linked to the build tools and fake env >> > for not native modules for the build tool (go for gradle which is java >> > first for instance). This is why having a real build which is natural >> > per language would be

Re: [DISCUSS] - Separate JIRA notifications to a new mailing list

2018-10-11 Thread Robert Bradshaw
Huge +1 from me too. On Thu, Oct 11, 2018 at 2:42 PM Jean-Baptiste Onofré wrote: > > +1 > > We are doing the same in Karaf as well. > > Regards > JB > > On 11/10/2018 14:35, Colm O hEigeartaigh wrote: > > Hi all, > > > > Apologies in advance if this has already been discussed (and rejected). > >

Re: [DISCUSS] Committer Guidelines / Hygene before merging PRs

2018-09-29 Thread Robert Bradshaw
;>> wrote: >>>> >>>>> I brought up this discussion a few months ago from the other side: I >>>>> don't like my commits being squashed. I try to create logical commits that >>>>> each passes tests and could be broken up into multiple

Re: [PROPOSAL] Prepare Beam 2.8.0 release

2018-10-04 Thread Robert Bradshaw
+1 to cutting the release. I agree that the LTS label requires more discussion. I think it boils down to the question of whether we are comfortable with encouraging people to not upgrade to the latest Beam. It probably boils down to creating a list of (potential) blockers and then going from

Re: Rethinking Timers as PCollections

2018-10-04 Thread Robert Bradshaw
hich seems to be already the case). Timers as > separate PCollections seems elegant but less practical to me. > > -Max > > [Disclaimer: I could be wrong since I just thought about this in more > detail] > > On 20.09.18 00:28, Robert Bradshaw wrote: > > On Wed, Sep 19, 2018 at

Re: Rethinking Timers as PCollections

2018-09-19 Thread Robert Bradshaw
On Wed, Sep 19, 2018 at 11:54 PM Lukasz Cwik wrote: > > On Wed, Sep 19, 2018 at 2:46 PM Robert Bradshaw > wrote: > >> On Wed, Sep 19, 2018 at 8:31 PM Lukasz Cwik wrote: >> >>> *How does modelling a timer as a PCollection help the Beam model?* >>> >

Re: Rethinking Timers as PCollections

2018-09-19 Thread Robert Bradshaw
tation. The special treatment >> (and slight confusion) at the graph level perhaps was an early warning >> sign, discovering the extra complexity wiring this in a runner should be a >> reason to revisit. >> >> Conceptually timers are special state, they are certainly more state

Re: Retiring runners?

2018-09-21 Thread Robert Bradshaw
Glad to hear Gearpump is still alive. It is hard to measure how much of a burden these additional runners are at the moment. I suggest that if it comes to a point that non-trivial changes are needed, we reach out to the list. If no one agrees to support it, we could disable the tests and, after

Re: Compatibility Matrix vs Runners in the code base

2018-09-21 Thread Robert Bradshaw
I don't know that we need to limit the matrix to runners in the Beam codebase (in fact, I could envision a world where most runners live in an upstream codebase), but at the very lease each of these runners should have a link to a page about using that runner with Beam. On Fri, Sep 21, 2018 at

Re: [jira] [Commented] (BEAM-5468) Allow runner to set worker log level in Python SDK harness.

2018-09-24 Thread Robert Bradshaw
to set worker log level in Python SDK harness. > > --- > > > > Key: BEAM-5468 > > URL: https://issues.apache.org/jira/browse/BEAM-5468 > > Project: Beam > > Issue Type: Improvement

Re: Removing documentation for old Beam versions

2018-09-26 Thread Robert Bradshaw
requisite for this effort. The goal of this work is to >>>>>>> improve the reliability of automation for contributing website changes. >>>>>>> At >>>>>>> last measure, only about half of beam-site PR merges use Mergebot >>>&

Re: Removing documentation for old Beam versions

2018-09-26 Thread Robert Bradshaw
h from a > Git repo, SVN, or a UI-based CMS interface. > > On Wed, Sep 26, 2018 at 9:45 AM Robert Bradshaw > wrote: > >> I am also definitely in favor of a single repository. Perhaps I'm just >> misunderstanding why the generated must be put in a git repository at >> al

Re: [VOTE] Release 2.7.0, release candidate #2

2018-09-25 Thread Robert Bradshaw
+1 (binding) I verified all the signatures and hashes, as well as one of the Python wheels, and that we're not shipping gradle[w] but otherwise the content matches the git repo (except a SNAPSHOT vs version change to the source). The changes [1] look minimal compared to RC1, so most of the

Re: [VOTE] Release 2.7.0, release candidate #3

2018-09-26 Thread Robert Bradshaw
+1 (binding), same verification as before. On Wed, Sep 26, 2018 at 7:36 AM Charles Chen wrote: > To clarify, the only difference between RC2 and RC3 is the Python fix > https://github.com/apache/beam/pull/6494. > > This means that the Java validations from RC2 should carry over, though I >

***UNCHECKED*** Re: Discussion: Scheduling across runner and SDKHarness in Portability framework

2018-09-19 Thread Robert Bradshaw
GBK - ExecutableStage - GBK - ... (or is it not always a digraph of this form, possibly with branching)? > > Thanks, > Thomas > > > On Fri, Sep 14, 2018 at 2:56 AM Robert Bradshaw > wrote: > >> Currently the best solution we've come up with is that we must process an &

Re: [ANNOUNCEMENT] New Beam chair: Kenneth Knowles

2018-09-20 Thread Robert Bradshaw
Congratulations Kenn! And thank you, Davor, for the hard work you've put in these last several years. On Thu, Sep 20, 2018 at 9:50 AM Tim Robertson wrote: > Thank you to Davor all the PMC - I can only imagine how much work it has > been to get Beam to where it is today. > > Congratulations

Re: [DISCUSS] Committer Guidelines / Hygene before merging PRs

2018-09-28 Thread Robert Bradshaw
ure. > > > https://lists.apache.org/thread.html/8d29e474e681ab9123280164d95075bb8b0b91486b66d3fa25ed20c2@%3Cdev.beam.apache.org%3E > > Andrew > > On Fri, Sep 28, 2018 at 7:29 AM Chamikara Jayalath > wrote: > >> >> >> On Thu, Sep 27, 2018 at 9:51 AM Robert Bradshaw >&g

Re: Beam JobService Problem

2019-01-15 Thread Robert Bradshaw
On Tue, Jan 15, 2019 at 1:19 AM Ankur Goenka wrote: > > Thanks Sam for bringing this to the list. > > As preparation_ids are not reusable, having preparation_id and job_id same > makes sense to me for Flink. I think we change the protocol and only have one kind of ID. As well as solving the

Re: [DISCUSSION] UTests and embedded backends

2019-01-21 Thread Robert Bradshaw
I am of the same opinion, this is the approach we're taking for Flink as well. Various mitigations (e.g. capping the parallelism at 2 rather than the default of num cores) have helped. Several times the idea has been proposed to group unit tests together for "expensive" backends. E.g. for

Re: gradle clean causes long-running python installs

2019-01-21 Thread Robert Bradshaw
Just some background, grpcio-tools is what's used to do the proto generation. Unfortunately it's expensive to compile and doesn't provide very many wheels, so we want to install it once, not every time. (It's also used in more than just tests; one needs it every time the .proto files change.)

Re: [DISCUSSION] UTests and embedded backends

2019-01-22 Thread Robert Bradshaw
On Mon, Jan 21, 2019 at 10:42 PM Kenneth Knowles wrote: > > Robert - you meant this as a mostly-automatic thing that we would engineer, > yes? Yes, something like TestPipeline that buffers up the pipelines and then executes on class teardown (details TBD). > A lighter-weight fake, like using

Cross-language pipelines

2019-01-22 Thread Robert Bradshaw
Now that we have the FnAPI, I started playing around with support for cross-language pipelines. This will allow things like IOs to be shared across all languages, SQL to be invoked from non-Java, TFX tensorflow transforms to be invoked from non-Python, etc. and I think is the next step in

Re: Cross-language pipelines

2019-01-23 Thread Robert Bradshaw
;> >>> >>> On Tue, Jan 22, 2019 at 10:53 AM Chamikara Jayalath >>> wrote: >>>> >>>> Thanks Robert. >>>> >>>> On Tue, Jan 22, 2019 at 4:39 AM Robert Bradshaw >>>> wrote: >>>>> >>>>> Now

Re: Dealing with expensive jenkins + Dataflow jobs

2019-01-23 Thread Robert Bradshaw
I like the idea of creating separate project(s) for load tests so as to not compete with other tests and the standard development cycle. As for how many workers is too many, I would take the track "what is it we're trying to test?" Unless your stress-testing the shuffle itself, much of what Beam

Re: Cross-language pipelines

2019-01-23 Thread Robert Bradshaw
happen as part of construction because the set of outputs (and their properties) can be dynamic based on the expansion. > Thanks, > Max > > On 23.01.19 04:12, Robert Bradshaw wrote: > > No, this PR simply takes an endpoint address as a parameter, expecting > > it to alread

Re: Python Flink tests failing on Jenkins

2019-01-16 Thread Robert Bradshaw
ppreciated. On Wed, Jan 16, 2019 at 3:32 AM Ahmet Altay wrote: > > +Robert Bradshaw > > Is it this test suite: > https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PreCommit_Python_ValidatesRunner_Flink_Commit/ > > There is a recent change related to that > https:

Re: How to use "PortableRunner" in Python SDK?

2019-01-23 Thread Robert Bradshaw
We should probably make the job endpoint mandatory for PortableRunner, and offer a separate FlinkRunner (and others) that provides a default endpoint and otherwise delegates everything down. On Thu, Nov 15, 2018 at 12:07 PM Maximilian Michels wrote: > > > 1) The default behavior, where

Re: excessive java precommit logging

2018-12-20 Thread Robert Bradshaw
Interestingly, I was thinking exactly the same thing the other day. If we could drop the info logs for passing tests, that would be ideal. Regardless, tests should fail (when possible) with actionable messages. I think the rare case of not being able to reproduce the error locally if info logs

Re: [VOTE] Release Vendored gRPC 1.13.1 v0.2, release candidate #1

2018-12-21 Thread Robert Bradshaw
+1, let's get this out. We can decide about 2.9.1 later. On Fri, Dec 21, 2018 at 10:43 AM Maximilian Michels wrote: > > +1 > > On 20.12.18 23:11, Tyler Akidau wrote: > > +1, Approve the release. > > > > -Tyler > > > > On Thu, Dec 20, 2018 at 9:49 AM Ahmet Altay > >

Re: Report to the Board, December 2018

2018-12-12 Thread Robert Bradshaw
Mostly looks good to me. I would probably omit the {issues,builds}@beam.apache.org stats as "nothing significant in the figures" but note that the dev list excludes the previous automated emails (or at least a reference to where you say this above). Is it worth noting StackOverflow activity here

Re: [DISCUSS] SplittableDoFn Java SDK User Facing API

2018-11-30 Thread Robert Bradshaw
On Fri, Nov 30, 2018 at 6:38 PM Lukasz Cwik wrote: > > Sorry, for some reason I thought I had answered these. No problem, thanks for you patience :). > On Fri, Nov 30, 2018 at 2:20 AM Robert Bradshaw wrote: >> >> I still have outstanding questions (above) about >>

Re: [DISCUSS] Structuring Java based DSLs

2018-11-30 Thread Robert Bradshaw
I don't really see Euphoria as a subset of SQL or the other way around, and I think it makes sense to use either without the other, so by this criteria keeping them as siblings than a nesting. That said, I think it's really good to have a bunch of shared code, e.g. a join library that could be

Re: [SDF] Why do we need markDone (or an equivalent claim)?

2018-11-30 Thread Robert Bradshaw
and would be similar to passing in Long.MAX_VALUE for the file > offset range. Having to choose a value pass depending on the restriction tracker type is something that could simply be eliminated. > On Fri, Nov 30, 2018 at 2:45 AM Robert Bradshaw wrote: >> >> In

Re: Handling large values

2018-11-29 Thread Robert Bradshaw
On Thu, Nov 29, 2018 at 7:08 PM Lukasz Cwik wrote: > > On Thu, Nov 29, 2018 at 7:13 AM Robert Bradshaw wrote: >> >> On Thu, Nov 29, 2018 at 2:18 AM Lukasz Cwik wrote: >> > >> > I don't believe we would need to change any other coders since >> > Seek

<    1   2   3   4   5   6   7   8   9   10   >