Re: Beam Samza Runner status update

2018-10-12 Thread Xinyu Liu
@Max: absolutely we should work together! FlinkRunner has been our best reference since the start of our SamzaRunner, and the previous work in Flink portable runner has been extremely valuable to us too. We haven't got to the point of portable stateful processing yet. Our next step is to hook up a

[BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-12 Thread Thomas Weise
[moving to the list] The requirement driving this part of the change was to allow a user to specify pipeline options that a runner supports without having to declare those in each language SDK. In the specific scenario, we have options that the Flink runner supports (and can validate), that are

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-12 Thread Charles Chen
For context, I made comments on https://github.com/apache/beam/pull/6600 noting that the changes being made were not good for Beam backwards-compatibility. The change as is allows users to use pipeline options without explicitly defining them, which is not the type of usage we would like to

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-12 Thread Henning Rohde
Agree that pipeline options lack some mechanism for scoping. It is also not always possible distinguish options meant to be consumed at pipeline construction time, by the runner, by the SDK harness, by the user code or any combination -- and this causes confusion every now and then. For Dataflow,

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-12 Thread Ahmet Altay
On Fri, Oct 12, 2018 at 10:11 AM, Charles Chen wrote: > For context, I made comments on https://github.com/apache/beam/pull/6600 > noting that the changes being made were not good for Beam > backwards-compatibility. The change as is allows users to use pipeline > options without explicitly

Re: post-commit failure emails

2018-10-12 Thread Kenneth Knowles
The important thing is sending just one email. And it is quite important to get a build green before turning it on. Otherwise the suspects are, indeed, every email address in the history of the project. We've experienced this. On Fri, Oct 12, 2018 at 1:58 AM Robert Bradshaw wrote: > I agree the

Re: [PROPOSAL] Prepare Beam 2.8.0 release

2018-10-12 Thread Ahmet Altay
Update: I re-cut the release branch. Only remaining issue on the 2.8.0 list is currently RabbitMQIO. JB, let me know if you would like to cherry pick that in to the release branch. On Wed, Oct 10, 2018 at 1:44 PM, Ahmet Altay wrote: > Given the number of open issues, I will re-cut the release

Re: [DISCUSS] Beam public roadmap

2018-10-12 Thread Kenneth Knowles
Personally, I think cwiki is best for dev community, while important stuff for users should go on the web site. But experimenting with the content on cwiki seems like a quick and easy thing to try out. On Fri, Oct 12, 2018 at 1:43 AM Maximilian Michels wrote: > Great idea, Kenn! > > How about

Re: [DISCUSS] Beam public roadmap

2018-10-12 Thread Tim Robertson
Thanks Kenn, I think this is a very good idea. My preference would be part of the website and not on a wiki. Those who need to contribute can do so easily and I find wikis often get messy/stale/overwhelming. The website will also mean that we can use dev@ and Jira to track, discuss and help

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-12 Thread Charles Chen
What I mean is that a user may find that it works for them to pass "--myarg blah" and access it as "options.myarg" without explicitly defining a "my_arg" flag due to the added logic. This is not the intended behavior and we may want to change this implementation detail in the future. However,

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-12 Thread Thomas Weise
Can you please elaborate more what practical problems this introduces for users? I can see that this change allows a user to specify a runner specific option, which in the future may change because we decide to scope differently. If this only affects users of the portable Flink runner (like us),

Java postcommits duration almost hit 4 hours

2018-10-12 Thread Mikhail Gryzykhin
Hi everyone, I just wanted to highlight an interesting fact: Our java postcommits duration almost *doubled* since last week rising from 2.2 to nearly 4 hours. (See bottom-left graph on this dashboard ) We might want to check on the

contributor permission for Beam Jira tickets

2018-10-12 Thread Hai Lu
Hi, This is Hai from LinkedIn. I'm closely working with Xinyu on portable API for Samza runner. Can someone add me as a contributor for Beam's Jira issue tracker? I would like to create/assign tickets for my work. Thanks, Hai

Re: contributor permission for Beam Jira tickets

2018-10-12 Thread Hai Lu
Sorry I forgot to mention my Jira ID, it's lhaiesp Thanks, Hai On Fri, Oct 12, 2018 at 2:38 PM Kenneth Knowles wrote: > Hi Hai, > > Have you created an account? A search for your name did not turn up > anything. If you tell me your Jira ID I can add you. > > Kenn > > On Fri, Oct 12, 2018 at

Re: contributor permission for Beam Jira tickets

2018-10-12 Thread Kenneth Knowles
Done. Welcome! On Fri, Oct 12, 2018 at 2:57 PM Hai Lu wrote: > Sorry I forgot to mention my Jira ID, it's lhaiesp > > Thanks, > Hai > > On Fri, Oct 12, 2018 at 2:38 PM Kenneth Knowles wrote: > >> Hi Hai, >> >> Have you created an account? A search for your name did not turn up >> anything. If

Re: Java postcommits duration almost hit 4 hours

2018-10-12 Thread Anton Kedin
Not sure where other perf issues are coming from, but this specific BQ test suite was disabled yesterday: https://github.com/apache/beam/pull/6658 On Fri, Oct 12, 2018 at 3:20 PM Kenneth Knowles wrote: > Nice catch. Here is a build that went from 2.5 to 3 hours: >

Re: Java postcommits duration almost hit 4 hours

2018-10-12 Thread Pablo Estrada
We added some big query tests, which are notably slow. My plan is to add a new test suite to run GCP tests, but I haven't gotten around to it yet. Best -P. On Fri, Oct 12, 2018, 4:01 PM Anton Kedin Not sure where other perf issues are coming from, but this specific BQ > test suite was disabled

Re: [DISCUSS] Beam public roadmap

2018-10-12 Thread Kenneth Knowles
Did some searching about to see what other projects have done. Most OSS projects with open governance don't actually have such a thing AFAICT. Here are some from various [types of] projects. Please contribute links for any project you can think of that might be interesting examples. My personal

Re: [DISCUSS] Beam public roadmap

2018-10-12 Thread Kenneth Knowles
I think we can easily steer clear of those concerns. It should not look like a company's roadmap. This is just a term that users search for and ask for. It might be an incremental improvement on https://beam.apache.org/contribute/#works-in-progress to present it more for users, to just give them a

Re: contributor permission for Beam Jira tickets

2018-10-12 Thread Kenneth Knowles
Hi Hai, Have you created an account? A search for your name did not turn up anything. If you tell me your Jira ID I can add you. Kenn On Fri, Oct 12, 2018 at 2:16 PM Hai Lu wrote: > Hi, > > This is Hai from LinkedIn. I'm closely working with Xinyu on portable API > for Samza runner. Can

Re: Java postcommits duration almost hit 4 hours

2018-10-12 Thread Kenneth Knowles
Nice catch. Here is a build that went from 2.5 to 3 hours: https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Java_GradleBuild/1654/ looks like it added some BQ tests. Not sure that can account for it. >From there it was red for some time and slow once it went green again and was

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-12 Thread Thomas Weise
Thanks, will tag you and looking forward to feedback so we can ensure that changes work for everyone. Looking at the PR, I see agreement from Max to revert the change on the release branch, but not in master. Would you mind to restore it in master? Thanks On Fri, Oct 12, 2018 at 4:40 PM Ahmet

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-12 Thread Charles Chen
The current release branch ( https://github.com/apache/beam/commits/release-2.8.0) was cut after the revert went in. Sent out https://github.com/apache/beam/pull/6683 as a revert of the revert. Regarding your comment above, I can help out with the design / PR reviews for common Python code as

Re: [BEAM-5442] Store duplicate unknown (runner) options in a list argument

2018-10-12 Thread Ahmet Altay
On Fri, Oct 12, 2018 at 11:31 AM, Charles Chen wrote: > What I mean is that a user may find that it works for them to pass > "--myarg blah" and access it as "options.myarg" without explicitly defining > a "my_arg" flag due to the added logic. This is not the intended behavior > and we may want

Re: Beam Samza Runner status update

2018-10-12 Thread Maximilian Michels
Thanks for the updating, Xinyu and Hai! Great to see another Running emerging :) I'm on the FlinkRunner. Looking forward to working together with you to make the Beam Runners even better. Particularly, we should sync on the portability, as some things are still to be fleshed out. In Flink, we

Re: Splitting the repo

2018-10-12 Thread Robert Bradshaw
On Wed, Oct 10, 2018 at 9:21 PM Kenneth Knowles wrote: > > I think Robert's initial question needs to be focused on a particular split. Yes, thank for bringing this back to the original question. > I agree that a "single project spanning multiple repos" does not make sense. > But separate

Re: [DISCUSS] Beam public roadmap

2018-10-12 Thread Maximilian Michels
Great idea, Kenn! How about putting the roadmap in the Confluent wiki? We can link the page from the web site. The timeline should not be too specific but should give users an idea of what to expect. On 10.10.18 22:43, Romain Manni-Bucau wrote: What about a link in the menu. It should

Re: post-commit failure emails

2018-10-12 Thread Robert Bradshaw
I agree the jenkins emails are spammy (to the point that I honestly can't follow all of them). +1 to emailing "suspects" as defined by those that impacted the build in the time it turned green to red. On Fri, Oct 12, 2018 at 12:55 AM Udi Meiri wrote: > > The email trigger is setup to trigger on

Re: Python SDK: .options deprecation

2018-10-12 Thread Robert Bradshaw
Correct. Among other things, we don't want to expose the choice of runner during pipeline construction (perhaps it's even deferred), or characteristics like streaming vs. batch (the runner should be able to make this choice on its own). This was not yet pushed all the way through in Python as it