Re: Another another new contributor! :)

2019-02-06 Thread Reza Ardeshir Rokni
Welcome!

On Tue, 5 Feb 2019 at 23:34, Kenneth Knowles  wrote:

> Welcome Kyle!
>
> On Tue, Feb 5, 2019 at 4:34 AM Maximilian Michels  wrote:
>
>> Welcome Kyle! Excited to see the Spark Runner moving towards portability!
>>
>> On 05.02.19 01:14, Connell O'Callaghan wrote:
>> > Welcome Kyle!
>> >
>> > On Mon, Feb 4, 2019 at 3:18 PM Ahmet Altay > > > wrote:
>> >
>> > Welcome!
>> >
>> > On Mon, Feb 4, 2019 at 3:13 PM Rui Wang > > > wrote:
>> >
>> > Welcome!
>> >
>> > -Rui
>> >
>> > On Mon, Feb 4, 2019 at 2:50 PM Kyle Weaver > > > wrote:
>> >
>> > Hello Beam developers,
>> >
>> > My name is Kyle Weaver (alias "ibzib" on Github/Slack). Like
>> > Brian, I recently switched roles at Google (I previously
>> > worked on Prow, Kubernetes' CI system). My goal in the
>> > coming weeks is to help begin implementing portability
>> > support for the Spark runner. I look forward to
>> > collaborating with all of you!
>> >
>> > Kyle
>> >
>> > Kyle Weaver |  Software Engineer |
>> kcwea...@google.com
>> >  | +1650203
>> >
>> >
>>
>


Re: [DISCUSS] (Forked thread) Beam issue triage & assignees

2019-02-06 Thread Kenneth Knowles
I re-triaged most issues where the creation date != last update. I worked
through everyone with more issues than myself (which I have triaged
regularly) and a few people with a few fewer issues.

I didn't look as closely at issues that were filed by the assignee. So if
you filed a bunch of issues that landed on yourself, take a look.

If you have fewer than 30 issues assigned to you, please take a look at
them now.

Kenn

On Wed, Feb 6, 2019 at 8:15 PM Kenneth Knowles  wrote:

> While we work with infra on this, let's remove the broken system and use
> tags. It is important that issues coming in are known to be untriaged, so
> instead of a "Needs Triage" label, we should use "triaged". So I will take
> these actions that everyone seems to agree on:
>
>  - Remove default assignment from Jira configs
>  - Unassign all issues from people with a huge number
>  - Add "triaged" tag to issues that are assigned and have some meaningful
> recent activity
>
> I will use trial-and-error to figure out what looks OK for "huge number"
> and "meaningful recent activity".
>
> Kenn
>
> On Fri, Jan 11, 2019 at 3:20 PM Kenneth Knowles  wrote:
>
>> Filed https://issues.apache.org/jira/browse/INFRA-17628 for the new
>> status. The rest of 1-3 is self-service I think. I expect step 4 and 5 will
>> need INFRA as well, but I/we should do what we can to make a very clear
>> request.
>>
>> On Fri, Jan 11, 2019 at 12:54 PM Kenneth Knowles  wrote:
>>
>>> It sounds like there's a lot of consensus, pretty much on the action
>>> items that Max and Ahmet suggested. I will start on these first steps if no
>>> one objects:
>>>
>>> 0) Add a Needs Review status to our workflow
>>> 1) Change new issues to be Unassigned and to be in status "Needs Review"
>>> 2) Unassign all issues from folks with > 30
>>>
>>> And I'm not sure if folks had more to say on these:
>>>
>>> 3) Use Wiki of multiple committers per component rather than Jira
>>> component owners
>>> 4) Automatically unassign stale issues that are just sitting on an
>>> assignee
>>> 5) Look into SLOs per issue priority and see how we can surface SLO
>>> violations (reports and pings)
>>>
>>> Kenn
>>>
>>> On Thu, Jan 10, 2019 at 11:41 AM Scott Wegner 
>>> wrote:
>>>
 +1

 > 3) Ensure that each component's unresolved issues get looked at
 regularly

 This is ideal, but I also don't know how to get to this state. Starting
 with clear component ownership and expectations will help. If the triaging
 process is well-defined, then members of the community can help for any
 components which need additional support.

 On Thu, Jan 10, 2019 at 12:21 AM Mikhail Gryzykhin <
 gryzykhin.mikh...@gmail.com> wrote:

> +1 to keep issues unassigned and reevaluate backlog from time to time.
>
> We can also auto-unassign if there was no activity on ticket for N
> days. Or we can have auto-mailed report that highlights stale assigned
> issues.
>
> On Thu, Jan 10, 2019 at 12:10 AM Robert Bradshaw 
> wrote:
>
>> On Thu, Jan 10, 2019 at 3:20 AM Ahmet Altay  wrote:
>> >
>> > I agree with the proposals here. Initial state of "Needs Review"
>> and blocking releases on untriaged issues will ensure that we will at 
>> least
>> look at every new issue once.
>>
>> +1.
>>
>> I'm more ambivalent about closing stale issues. Unlike PRs, issues can
>> be filed as "we should (not forget to) do this" much sooner than
>> they're actively worked on.
>>
>> > On Wed, Jan 9, 2019 at 10:30 AM Maximilian Michels 
>> wrote:
>> >>
>> >> Hi Kenn,
>> >>
>> >> As your data shows, default-assigning issues to a single person
>> does not
>> >> automatically solve triaging issues. Quite the contrary, it hides
>> the triage
>> >> status of an issue.
>> >>
>> >>  From the perspective of the Flink Runner, we used to auto-assign
>> but we got rid
>> >> of this. Instead, we monitor the newly coming issues and take
>> actions. We also
>> >> go through the old ones occasionally. I believe that works fine
>> for us.
>> >>
>> >> The Flink project itself also does not default-assign, newly
>> created issues are
>> >> unassigned. There are component leads overseeing issues. There is
>> no guarantee
>> >> that every issue gets triaged.
>> >>
>> >> "Needs Triage" or or "Needs Review" (seems easier to understand of
>> non-native
>> >> speakers) sounds like a good addition, but it will not solve the
>> problem that
>> >> issues need to be curated and maintained after the initial triage.
>> For example,
>> >> I've seen issues not closed after they have been fixed via a PR.
>> However, "Needs
>> >> Triage" will ensure that all issues get looked at. This could be
>> helpful for
>> >> releases, if not-yet-triaged issues are looked at early enough.
>> >>
>> >> I'd suggest to:

Re: [DISCUSS] (Forked thread) Beam issue triage & assignees

2019-02-06 Thread Kenneth Knowles
While we work with infra on this, let's remove the broken system and use
tags. It is important that issues coming in are known to be untriaged, so
instead of a "Needs Triage" label, we should use "triaged". So I will take
these actions that everyone seems to agree on:

 - Remove default assignment from Jira configs
 - Unassign all issues from people with a huge number
 - Add "triaged" tag to issues that are assigned and have some meaningful
recent activity

I will use trial-and-error to figure out what looks OK for "huge number"
and "meaningful recent activity".

Kenn

On Fri, Jan 11, 2019 at 3:20 PM Kenneth Knowles  wrote:

> Filed https://issues.apache.org/jira/browse/INFRA-17628 for the new
> status. The rest of 1-3 is self-service I think. I expect step 4 and 5 will
> need INFRA as well, but I/we should do what we can to make a very clear
> request.
>
> On Fri, Jan 11, 2019 at 12:54 PM Kenneth Knowles  wrote:
>
>> It sounds like there's a lot of consensus, pretty much on the action
>> items that Max and Ahmet suggested. I will start on these first steps if no
>> one objects:
>>
>> 0) Add a Needs Review status to our workflow
>> 1) Change new issues to be Unassigned and to be in status "Needs Review"
>> 2) Unassign all issues from folks with > 30
>>
>> And I'm not sure if folks had more to say on these:
>>
>> 3) Use Wiki of multiple committers per component rather than Jira
>> component owners
>> 4) Automatically unassign stale issues that are just sitting on an
>> assignee
>> 5) Look into SLOs per issue priority and see how we can surface SLO
>> violations (reports and pings)
>>
>> Kenn
>>
>> On Thu, Jan 10, 2019 at 11:41 AM Scott Wegner  wrote:
>>
>>> +1
>>>
>>> > 3) Ensure that each component's unresolved issues get looked at
>>> regularly
>>>
>>> This is ideal, but I also don't know how to get to this state. Starting
>>> with clear component ownership and expectations will help. If the triaging
>>> process is well-defined, then members of the community can help for any
>>> components which need additional support.
>>>
>>> On Thu, Jan 10, 2019 at 12:21 AM Mikhail Gryzykhin <
>>> gryzykhin.mikh...@gmail.com> wrote:
>>>
 +1 to keep issues unassigned and reevaluate backlog from time to time.

 We can also auto-unassign if there was no activity on ticket for N
 days. Or we can have auto-mailed report that highlights stale assigned
 issues.

 On Thu, Jan 10, 2019 at 12:10 AM Robert Bradshaw 
 wrote:

> On Thu, Jan 10, 2019 at 3:20 AM Ahmet Altay  wrote:
> >
> > I agree with the proposals here. Initial state of "Needs Review" and
> blocking releases on untriaged issues will ensure that we will at least
> look at every new issue once.
>
> +1.
>
> I'm more ambivalent about closing stale issues. Unlike PRs, issues can
> be filed as "we should (not forget to) do this" much sooner than
> they're actively worked on.
>
> > On Wed, Jan 9, 2019 at 10:30 AM Maximilian Michels 
> wrote:
> >>
> >> Hi Kenn,
> >>
> >> As your data shows, default-assigning issues to a single person
> does not
> >> automatically solve triaging issues. Quite the contrary, it hides
> the triage
> >> status of an issue.
> >>
> >>  From the perspective of the Flink Runner, we used to auto-assign
> but we got rid
> >> of this. Instead, we monitor the newly coming issues and take
> actions. We also
> >> go through the old ones occasionally. I believe that works fine for
> us.
> >>
> >> The Flink project itself also does not default-assign, newly
> created issues are
> >> unassigned. There are component leads overseeing issues. There is
> no guarantee
> >> that every issue gets triaged.
> >>
> >> "Needs Triage" or or "Needs Review" (seems easier to understand of
> non-native
> >> speakers) sounds like a good addition, but it will not solve the
> problem that
> >> issues need to be curated and maintained after the initial triage.
> For example,
> >> I've seen issues not closed after they have been fixed via a PR.
> However, "Needs
> >> Triage" will ensure that all issues get looked at. This could be
> helpful for
> >> releases, if not-yet-triaged issues are looked at early enough.
> >>
> >> I'd suggest to:
> >>
> >> 1) Change new issues to be Unassigned and to be in status "Needs
> Review"
> >> 2) Remove Assignees from all not-being-worked-on issues
> >
> >
> > For the existing issues, I suggest unassign all issues assigned to
> people with > N issues for a large N. Something like 30, > %1 of all
> issues. There are also issues assigned to people who are no longer active
> in the community. We could unassign those as well.
> >
> > Another issue is average age for open issues is also ever growing
> and is over > 300 days now. It would be nice if we can have an 

Re: [VOTE] Release 2.10.0, release candidate #3

2019-02-06 Thread Reuven Lax
+1 (binding)

On Wed, Feb 6, 2019 at 2:28 PM Kenneth Knowles  wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #3 for the version 2.10.0,
> as follows:
>
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org [2],
> which is signed with the key with fingerprint 6ED551A8AE02461C [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.10.0-RC3" [5],
> * website pull request listing the release [6] and publishing the API
> reference manual [7].
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
> * Validation sheet with a tab for 2.10.0 release to help with validation
> [7].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Kenn
>
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344540
> [2] https://dist.apache.org/repos/dist/dev/beam/2.10.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1058/
> [5] https://github.com/apache/beam/tree/v2.10.0-RC3
> [6] https://github.com/apache/beam/pull/7651/files
> [7] https://github.com/apache/beam-site/pull/586
> [8]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>


Re: 2.7.1 (LTS) release?

2019-02-06 Thread Kenneth Knowles
Having gone through the release process, I have a couple of git drawings to
share. Currently the release process looks like this (you'll have to view
in fixed width font if it is stripped by the mail manager).

-X master
   \
---Y-a--b---c- release-2.10.0

*   X: commit that updates master from 2.10.0-SNAPSHOT to 2.11.0-SNAPSHOT
(Python calls it 2.10.0dev, etc per lang, and we wrote a script for it)
*   The release branch starts the release branch from parent of X
*   Y: changes Python version to 2.10.0 (no dev) and you'll see why
*   On release branch, version is still 2.10.0-SNAPSHOT for Java
*   a, b, c: the gradle release plugin commits a change for Java to 2.10.0
then reverts it, and tags with RC1, RC2, RC3, etc. If the RC fails you have
to force reset and delete the tag.
*   The release script also builds from fresh clones, so this is all pushed
to GitHub. It can really clutter the history but is otherwise probably
harmless. Because of issues with scripting and gpg set up I had to build
maybe 10 "RCs" to roll RC2.

I think git can make this simpler. I would propose:

-X master
   \
--- release-2.10.0
 \  \  \
  a  b  c

*X: same
*Y: gone
*On release branch, both Java and Python are -SNAPSHOT or dev, etc.
(and it could be release-2.10 that advances minor version in the commit
after a succesful RC)
*To build an RC, add the commits like a, b, c which remove -SNAPSHOT
and tag; we have a bash script that collects all the places that need
editing, the one that built commit X.
*Whether to push the commit and tag first or build the RC first doesn't
matter that much but anyhow now it is off the history so it is fine to push.

Have I missed something vital about the current process?

Kenn



On Thu, Jan 31, 2019 at 8:49 PM Thomas Weise  wrote:

> Either looks fine to me. Same content, different label :)
>
>
> On Thu, Jan 31, 2019 at 6:32 PM Michael Luckey 
> wrote:
>
>> Thx Thomas for that clarification. I tried to express, I d slightly
>> prefer to have branches
>>
>> 2.7.x
>> 2.8.x
>> 2.9.x
>>
>> and tags:
>> 2.7.0
>> 2.7.1
>> ...
>>
>> So only difference would be to be more explicit on the branch name, i.e.
>> that it embraces all the patch versions. (I do not know how to better
>> express, that '2.7.x' is a literal string and should not be confused as
>> some placeholder.)
>>
>> Regarding the versioning, I always prefer the explicit version including
>> patch version. It might make it easier to help and resolve issues if it is
>> known on which patch level a user is running. I spent lot of lifetime
>> assuming some version and realising later it was 'just another snapshot'
>> version...
>>
>> Just my 2 ct... Also fine with the previous suggestion.
>>
>>
>>
>> On Fri, Feb 1, 2019 at 3:18 AM Thomas Weise  wrote:
>>
>>> Hi,
>>>
>>> As Kenn had already examplified, the suggestion was to have branches:
>>>
>>> 2.7
>>> 2.8
>>> 2.9
>>> ...
>>>
>>> and tags:
>>>
>>> 2.7.0
>>> 2.7.1
>>> ...
>>> 2.8.0
>>> ...
>>>
>>> Changes would go to the 2.7 branch, at some point release 2.7.1 is
>>> created. Then more changes may accrue on the same branch, maybe at some
>>> point 2.7.2 is released and so on.
>>>
>>> We could also consider changing the snapshot version to 2.7-SNAPSHOT,
>>> instead of 2.7.{0,1,...}-SNAPSHOT.
>>>
>>> With that it wouldn't even be necessary to change the version number on
>>> the branch.
>>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>>
>>> On Thu, Jan 31, 2019 at 5:59 PM Michael Luckey 
>>> wrote:
>>>
 Ah, sorry, I misread that.

 I slightly prefer the branch to have that '.x' suffix, as it is
 slightly more explicit. But technically there will be no difference.

 On Fri, Feb 1, 2019 at 2:55 AM Chamikara Jayalath 
 wrote:

> Sorry, what I meant was branches+tags for each minor version release
> and adding updates and tags to the same branch for patch releases. Name of
> the branch can be release-2.X for minor version release 2.X.0 as Thomas
> mentioned.
>
> - Cham
>
> On Thu, Jan 31, 2019 at 5:46 PM Michael Luckey 
> wrote:
>
>> Maybe we should not go so far to name branches 2.x. This will
>> probably make it difficult to support more than 1 LTS. Don't know, 
>> whether
>> we ever intent to do so, but supporting 2.7 and 2.13 on a 2.x branch 
>> seems
>> difficult?
>>
>> A more explicit 2.7.x with tags 2.7.1, 2.7.2 etc might be better? If
>> we are going to support a second LTS later on, we could just add that
>> 2.??.x branch.
>>
>> michel
>>
>> On Fri, Feb 1, 2019 at 2:37 AM Chamikara Jayalath <
>> chamik...@google.com> wrote:
>>
>>> +1 for 2.x branches and tags for 2.x.y releases.
>>>
>>> Also, I think we should integrate the dependency upgrade
>>> https://issues.apache.org/jira/browse/BEAM-6552 to 

Re: [VOTE] Release 2.10.0, release candidate #2

2019-02-06 Thread Maximilian Michels

Thank you. Here it is: https://github.com/apache/beam/pull/7753

On 06.02.19 18:30, Kenneth Knowles wrote:
OK. Canceling this vote. Can you please simultaneously open a cherrypick 
so we can move it along at the same time?


On Wed, Feb 6, 2019 at 9:25 AM Kenneth Knowles > wrote:


Quick clarification: I linked to the wrong verification spreadsheet
tab. The one for 2.10.0 is

https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=467787719

On Wed, Feb 6, 2019 at 7:33 AM Maximilian Michels mailto:m...@apache.org>> wrote:

- Ran Flink WordCount with Quickstart guide
- Ran release testing scripts for Flink

Discovered a regression:
https://jira.apache.org/jira/browse/BEAM-6608

If there is another blocker for the release, I'd would like to
fix this
for RC3. PR is already out.

Thanks,
Max

On 06.02.19 11:24, Robert Bradshaw wrote:
 > +1.
 >
 > I verified the source artifacts look good, and tried the
Python wheels.
 >
 > On Tue, Feb 5, 2019 at 11:57 PM Kenneth Knowles
mailto:k...@apache.org>> wrote:
 >>
 >> Hi everyone,
 >>
 >> Please review and vote on the release candidate #2 for the
version 2.10.0, as follows:
 >>
 >> [ ] +1, Approve the release
 >> [ ] -1, Do not approve the release (please provide specific
comments)
 >>
 >> The complete staging area is available for your review,
which includes:
 >> * JIRA release notes [1],
 >> * the official Apache source release to be deployed to
dist.apache.org  [2], which is signed
with the key with fingerprint 6ED551A8AE02461C [3],
 >> * all artifacts to be deployed to the Maven Central
Repository [4],
 >> * source code tag "v2.10.0-RC1" [5],
 >> * website pull request listing the release [6] and
publishing the API reference manual [7].
 >> * Python artifacts are deployed along with the source
release to the dist.apache.org  [2].
 >> * Validation sheet with a tab for 2.10.0 release to help
with validation [7].
 >>
 >> The vote will be open for at least 72 hours. It is adopted
by majority approval, with at least 3 PMC affirmative votes.
 >>
 >> Thanks,
 >> Kenn
 >>
 >> [1]

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344540
 >> [2] https://dist.apache.org/repos/dist/dev/beam/2.10.0/
 >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
 >> [4]
https://repository.apache.org/content/repositories/orgapachebeam-1057/
 >> [5] https://github.com/apache/beam/tree/v2.10.0-RC2
 >> [6] https://github.com/apache/beam/pull/7651/files
 >> [7] https://github.com/apache/beam-site/pull/586
 >> [8]

https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529



Re: [VOTE] Release 2.10.0, release candidate #2

2019-02-06 Thread Kenneth Knowles
OK. Canceling this vote. Can you please simultaneously open a cherrypick so
we can move it along at the same time?

On Wed, Feb 6, 2019 at 9:25 AM Kenneth Knowles  wrote:

> Quick clarification: I linked to the wrong verification spreadsheet tab.
> The one for 2.10.0 is
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=467787719
>
> On Wed, Feb 6, 2019 at 7:33 AM Maximilian Michels  wrote:
>
>> - Ran Flink WordCount with Quickstart guide
>> - Ran release testing scripts for Flink
>>
>> Discovered a regression: https://jira.apache.org/jira/browse/BEAM-6608
>>
>> If there is another blocker for the release, I'd would like to fix this
>> for RC3. PR is already out.
>>
>> Thanks,
>> Max
>>
>> On 06.02.19 11:24, Robert Bradshaw wrote:
>> > +1.
>> >
>> > I verified the source artifacts look good, and tried the Python wheels.
>> >
>> > On Tue, Feb 5, 2019 at 11:57 PM Kenneth Knowles 
>> wrote:
>> >>
>> >> Hi everyone,
>> >>
>> >> Please review and vote on the release candidate #2 for the version
>> 2.10.0, as follows:
>> >>
>> >> [ ] +1, Approve the release
>> >> [ ] -1, Do not approve the release (please provide specific comments)
>> >>
>> >> The complete staging area is available for your review, which includes:
>> >> * JIRA release notes [1],
>> >> * the official Apache source release to be deployed to dist.apache.org
>> [2], which is signed with the key with fingerprint 6ED551A8AE02461C [3],
>> >> * all artifacts to be deployed to the Maven Central Repository [4],
>> >> * source code tag "v2.10.0-RC1" [5],
>> >> * website pull request listing the release [6] and publishing the API
>> reference manual [7].
>> >> * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2].
>> >> * Validation sheet with a tab for 2.10.0 release to help with
>> validation [7].
>> >>
>> >> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>> >>
>> >> Thanks,
>> >> Kenn
>> >>
>> >> [1]
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344540
>> >> [2] https://dist.apache.org/repos/dist/dev/beam/2.10.0/
>> >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> >> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1057/
>> >> [5] https://github.com/apache/beam/tree/v2.10.0-RC2
>> >> [6] https://github.com/apache/beam/pull/7651/files
>> >> [7] https://github.com/apache/beam-site/pull/586
>> >> [8]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>>
>


Re: Proposal: Portability SDKHarness Docker Image Release with Beam Version Release.

2019-02-06 Thread Łukasz Gajowy
+1 to have a registry for images accessible to anyone. For snapshot images,
I agree that gcr + apache-beam-testing project seems a good and easy way to
start with.

Łukasz

wt., 22 sty 2019 o 19:43 Mark Liu  napisał(a):

> +1 to have an official Beam released container image.
>
> Also I would propose to add a verification step to (or after) the release
> process to do smoke check. Python have ValidatesContainer test that runs
> basic pipeline using newly built container for verification. Other sdk
> languages can do similar thing or add a common framework.
>
> Mark
>
> On Thu, Jan 17, 2019 at 5:56 AM Alan Myrvold  wrote:
>
>> +1 This would be great. gcr.io seems like a good option for snapshots
>> due to the permissions from jenkins to upload and ability to keep snapshots
>> around.
>>
>> On Wed, Jan 16, 2019 at 6:51 PM Ruoyun Huang  wrote:
>>
>>> +1 This would be a great thing to have.
>>>
>>> On Wed, Jan 16, 2019 at 6:11 PM Ankur Goenka  wrote:
>>>
 grc.io seems to be a good option. Given that we don't need the hosting
 server name in the image name makes it easily changeable later.

 Docker container for Apache Flink is named "flink" and they have
 different tags for different releases and configurations
 https://hub.docker.com/_/flink .We can follow a similar model and can
 name the image as "beam" (beam doesn't seem to be taken on docker hub) and
 use tags to distinguish Java/Python/Go and versions etc.

 Tags will look like:
 java-SNAPSHOT
 java-2.10.1
 python2-SNAPSHOT
 python2-2.10.1
 go-SNAPSHOT
 go-2.10.1


 On Wed, Jan 16, 2019 at 5:56 PM Ahmet Altay  wrote:

> For snapshots, we could use gcr.io. Permission would not be a problem
> since Jenkins is already correctly setup. The cost will be covered under
> apache-beam-testing project. And since this is only for snapshots, it will
> be only for temporary artifacts not for release artifacts.
>
> On Wed, Jan 16, 2019 at 5:50 PM Valentyn Tymofieiev <
> valen...@google.com> wrote:
>
>> +1, releasing containers is a useful process that we need to build in
>> Beam and it is required for FnApi users. Among other reasons, having
>> officially-released Beam SDK harness container images will make it easier
>> for users to do simple customizations to  container images, as they will 
>> be
>> able to use container image released by Beam as a base image.
>>
>> Good point about potential storage limitations on Bintray. With Beam
>> Release cadence we may quickly exceed the 10 GB quota. It may also affect
>> our decisions as to which images we want to release, for example: do we
>> want to only release one container image with Python 3 interpreter, or do
>> we want to release a container image for each Python 3 minor version that
>> Beam is compatible with.
>>
>
> Probably worth a separate discussion. I would favor first releasing a
> python 3 compatible version before figuring out how we would target
> multiple python 3 versions.
>

>
>>
>> On Wed, Jan 16, 2019 at 5:48 PM Ankur Goenka 
>> wrote:
>>
>>>
>>>
>>> On Wed, Jan 16, 2019 at 5:37 PM Ahmet Altay 
>>> wrote:
>>>


 On Wed, Jan 16, 2019 at 5:28 PM Ankur Goenka 
 wrote:

> - Could we start from snapshots first and then do it for releases?
> +1, releasing snapsots first makes sense to me.
> - For snapshots, do we need to clean old containers after a while?
> Otherwise I guess we will accumulate lots of containers.
> For snap shots we can maintain a single snapshot image from git
> HEAD daily. Docker has the internal image container id which changes
> everytime an image is changed and pulls new images as needed.
>

 There is a potential use this may not work with. If a user picks up
 a snaphsot build and want to use it until the next release arrives. I 
 guess
 in that case the user can copy the snapshotted container image and 
 rely on
 that.


>>> Yes, that should be reasonable.
>>>
 - Do we also need additional code changes for snapshots and
> releases to default to these specific containers? There could be a 
> version
> based mechanism to resolve the correct container to use.
> The current image defaults have username in it. We should be ok by
> just updating the default image url to published image url.
>
> We should also check for pricing and details about Apache-Bintray
> agreement before pushing images and changing defaults.
>

 There is information on bintray's pricing page about open source
 projects [1]. I do not know if there is a special apache-bintray 
 agreement
 or 

Re: [VOTE] Release 2.10.0, release candidate #2

2019-02-06 Thread Kenneth Knowles
Quick clarification: I linked to the wrong verification spreadsheet tab.
The one for 2.10.0 is
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=467787719

On Wed, Feb 6, 2019 at 7:33 AM Maximilian Michels  wrote:

> - Ran Flink WordCount with Quickstart guide
> - Ran release testing scripts for Flink
>
> Discovered a regression: https://jira.apache.org/jira/browse/BEAM-6608
>
> If there is another blocker for the release, I'd would like to fix this
> for RC3. PR is already out.
>
> Thanks,
> Max
>
> On 06.02.19 11:24, Robert Bradshaw wrote:
> > +1.
> >
> > I verified the source artifacts look good, and tried the Python wheels.
> >
> > On Tue, Feb 5, 2019 at 11:57 PM Kenneth Knowles  wrote:
> >>
> >> Hi everyone,
> >>
> >> Please review and vote on the release candidate #2 for the version
> 2.10.0, as follows:
> >>
> >> [ ] +1, Approve the release
> >> [ ] -1, Do not approve the release (please provide specific comments)
> >>
> >> The complete staging area is available for your review, which includes:
> >> * JIRA release notes [1],
> >> * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint 6ED551A8AE02461C [3],
> >> * all artifacts to be deployed to the Maven Central Repository [4],
> >> * source code tag "v2.10.0-RC1" [5],
> >> * website pull request listing the release [6] and publishing the API
> reference manual [7].
> >> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
> >> * Validation sheet with a tab for 2.10.0 release to help with
> validation [7].
> >>
> >> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
> >>
> >> Thanks,
> >> Kenn
> >>
> >> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344540
> >> [2] https://dist.apache.org/repos/dist/dev/beam/2.10.0/
> >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> >> [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1057/
> >> [5] https://github.com/apache/beam/tree/v2.10.0-RC2
> >> [6] https://github.com/apache/beam/pull/7651/files
> >> [7] https://github.com/apache/beam-site/pull/586
> >> [8]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>


Re: Resource usage exceeded: topics-per-project

2019-02-06 Thread Kenneth Knowles
Ah, I jumped to the wrong vocabulary. There is no auto-created topic. That
is part of TestPubsub and TestPubsubSignal. It should be cleaned up. The
only use of those is in SQL ITs. But PubsubIO _should_ use it.

Kenn

On Wed, Feb 6, 2019 at 9:04 AM Kenneth Knowles  wrote:

> To clarify, PubsubIO does not clean up auto-created subscriptions, and SQL
> doesn't compensate for that.
>
> On Wed, Feb 6, 2019 at 8:45 AM Mikhail Gryzykhin 
> wrote:
>
>> Thank you for quick response Andrew.
>>
>> I'll cleanup these. I'll keep the bug open and assign it to @Kenneth
>> Knowles  who's working on SQK for follow up: we need a
>> way to automatically cleanup topics.
>>
>> Current suggestions:
>> 1. Make SQL cleanup created topics
>> 2. Cleanup topics created by SQL in tests
>> 3. Increase quota so that topics have enough time to be cleaned up
>> automatically.
>>
>> Regards,
>> --Mikhail
>>
>> Have feedback ?
>>
>>
>> On Wed, Feb 6, 2019 at 8:05 AM Andrew Pilloud 
>> wrote:
>>
>>> SQL doesn't cleanup pubsub subscriptions. Feel free to delete those.
>>>
>>> Andrew
>>>
>>> On Wed, Feb 6, 2019, 8:01 AM Mikhail Gryzykhin 
>>> wrote:
>>>
 +Kenneth Knowles  you're working on SQL recently, so
 might provide some info.

 I see a lot of topics of format
 rojects/apache-beam-testing/topics/integ-test-PubsubJsonIT-testSQLLimit-2018-08-27-16-55-44-342-start-7392789257486934721
 .
 Seems we do not cleanup properly.

 Is it safe to cleanup topics with this name?

 --Mikhail

 Have feedback ?


 On Wed, Feb 6, 2019 at 7:46 AM Mikhail Gryzykhin 
 wrote:

> Minor UPD:
> As expected it fails most of our test jobs, since we use Pub/Subs in
> many tests.
>
> --Mikhail
>
> Have feedback ?
>
>
> On Wed, Feb 6, 2019 at 7:38 AM Mikhail Gryzykhin 
> wrote:
>
>> Hi everyone,
>>
>> Our python pipelines failed with limit exceeded error
>> 
>> :
>>
>> ResourceExhausted: 429 Your project has exceeded a limit: 
>> (type="topics-per-project", current=1, maximum=1).
>>
>>
>> Does anyone know if there were new tests that use topics added
>> recently?
>>
>> I tried to see list of topics, but UI fails
>> 
>> to load. Will see if I can use APIs to investigate.
>>
>> If anyone has good insight, please, pick up BEAM-6610
>> .
>>
>> --Mikhail
>>
>> Have feedback ?
>>
>


Re: Resource usage exceeded: topics-per-project

2019-02-06 Thread Andrew Pilloud
Oops, I'm mixing up terms here. Topics != Subscriptions. We shouldn't be
leaking topics.

Andrew

On Wed, Feb 6, 2019 at 9:04 AM Kenneth Knowles  wrote:

> To clarify, PubsubIO does not clean up auto-created subscriptions, and SQL
> doesn't compensate for that.
>
> On Wed, Feb 6, 2019 at 8:45 AM Mikhail Gryzykhin 
> wrote:
>
>> Thank you for quick response Andrew.
>>
>> I'll cleanup these. I'll keep the bug open and assign it to @Kenneth
>> Knowles  who's working on SQK for follow up: we need a
>> way to automatically cleanup topics.
>>
>> Current suggestions:
>> 1. Make SQL cleanup created topics
>> 2. Cleanup topics created by SQL in tests
>> 3. Increase quota so that topics have enough time to be cleaned up
>> automatically.
>>
>> Regards,
>> --Mikhail
>>
>> Have feedback ?
>>
>>
>> On Wed, Feb 6, 2019 at 8:05 AM Andrew Pilloud 
>> wrote:
>>
>>> SQL doesn't cleanup pubsub subscriptions. Feel free to delete those.
>>>
>>> Andrew
>>>
>>> On Wed, Feb 6, 2019, 8:01 AM Mikhail Gryzykhin 
>>> wrote:
>>>
 +Kenneth Knowles  you're working on SQL recently, so
 might provide some info.

 I see a lot of topics of format
 rojects/apache-beam-testing/topics/integ-test-PubsubJsonIT-testSQLLimit-2018-08-27-16-55-44-342-start-7392789257486934721
 .
 Seems we do not cleanup properly.

 Is it safe to cleanup topics with this name?

 --Mikhail

 Have feedback ?


 On Wed, Feb 6, 2019 at 7:46 AM Mikhail Gryzykhin 
 wrote:

> Minor UPD:
> As expected it fails most of our test jobs, since we use Pub/Subs in
> many tests.
>
> --Mikhail
>
> Have feedback ?
>
>
> On Wed, Feb 6, 2019 at 7:38 AM Mikhail Gryzykhin 
> wrote:
>
>> Hi everyone,
>>
>> Our python pipelines failed with limit exceeded error
>> 
>> :
>>
>> ResourceExhausted: 429 Your project has exceeded a limit: 
>> (type="topics-per-project", current=1, maximum=1).
>>
>>
>> Does anyone know if there were new tests that use topics added
>> recently?
>>
>> I tried to see list of topics, but UI fails
>> 
>> to load. Will see if I can use APIs to investigate.
>>
>> If anyone has good insight, please, pick up BEAM-6610
>> .
>>
>> --Mikhail
>>
>> Have feedback ?
>>
>


Re: Resource usage exceeded: topics-per-project

2019-02-06 Thread Kenneth Knowles
To clarify, PubsubIO does not clean up auto-created subscriptions, and SQL
doesn't compensate for that.

On Wed, Feb 6, 2019 at 8:45 AM Mikhail Gryzykhin  wrote:

> Thank you for quick response Andrew.
>
> I'll cleanup these. I'll keep the bug open and assign it to @Kenneth
> Knowles  who's working on SQK for follow up: we need a
> way to automatically cleanup topics.
>
> Current suggestions:
> 1. Make SQL cleanup created topics
> 2. Cleanup topics created by SQL in tests
> 3. Increase quota so that topics have enough time to be cleaned up
> automatically.
>
> Regards,
> --Mikhail
>
> Have feedback ?
>
>
> On Wed, Feb 6, 2019 at 8:05 AM Andrew Pilloud  wrote:
>
>> SQL doesn't cleanup pubsub subscriptions. Feel free to delete those.
>>
>> Andrew
>>
>> On Wed, Feb 6, 2019, 8:01 AM Mikhail Gryzykhin  wrote:
>>
>>> +Kenneth Knowles  you're working on SQL recently, so
>>> might provide some info.
>>>
>>> I see a lot of topics of format
>>> rojects/apache-beam-testing/topics/integ-test-PubsubJsonIT-testSQLLimit-2018-08-27-16-55-44-342-start-7392789257486934721
>>> .
>>> Seems we do not cleanup properly.
>>>
>>> Is it safe to cleanup topics with this name?
>>>
>>> --Mikhail
>>>
>>> Have feedback ?
>>>
>>>
>>> On Wed, Feb 6, 2019 at 7:46 AM Mikhail Gryzykhin 
>>> wrote:
>>>
 Minor UPD:
 As expected it fails most of our test jobs, since we use Pub/Subs in
 many tests.

 --Mikhail

 Have feedback ?


 On Wed, Feb 6, 2019 at 7:38 AM Mikhail Gryzykhin 
 wrote:

> Hi everyone,
>
> Our python pipelines failed with limit exceeded error
> 
> :
>
> ResourceExhausted: 429 Your project has exceeded a limit: 
> (type="topics-per-project", current=1, maximum=1).
>
>
> Does anyone know if there were new tests that use topics added
> recently?
>
> I tried to see list of topics, but UI fails
> 
> to load. Will see if I can use APIs to investigate.
>
> If anyone has good insight, please, pick up BEAM-6610
> .
>
> --Mikhail
>
> Have feedback ?
>



Re: Resource usage exceeded: topics-per-project

2019-02-06 Thread Mikhail Gryzykhin
Thank you for quick response Andrew.

I'll cleanup these. I'll keep the bug open and assign it to @Kenneth Knowles
 who's working on SQK for follow up: we need a way to
automatically cleanup topics.

Current suggestions:
1. Make SQL cleanup created topics
2. Cleanup topics created by SQL in tests
3. Increase quota so that topics have enough time to be cleaned up
automatically.

Regards,
--Mikhail

Have feedback ?


On Wed, Feb 6, 2019 at 8:05 AM Andrew Pilloud  wrote:

> SQL doesn't cleanup pubsub subscriptions. Feel free to delete those.
>
> Andrew
>
> On Wed, Feb 6, 2019, 8:01 AM Mikhail Gryzykhin  wrote:
>
>> +Kenneth Knowles  you're working on SQL recently, so
>> might provide some info.
>>
>> I see a lot of topics of format
>> rojects/apache-beam-testing/topics/integ-test-PubsubJsonIT-testSQLLimit-2018-08-27-16-55-44-342-start-7392789257486934721
>> .
>> Seems we do not cleanup properly.
>>
>> Is it safe to cleanup topics with this name?
>>
>> --Mikhail
>>
>> Have feedback ?
>>
>>
>> On Wed, Feb 6, 2019 at 7:46 AM Mikhail Gryzykhin 
>> wrote:
>>
>>> Minor UPD:
>>> As expected it fails most of our test jobs, since we use Pub/Subs in
>>> many tests.
>>>
>>> --Mikhail
>>>
>>> Have feedback ?
>>>
>>>
>>> On Wed, Feb 6, 2019 at 7:38 AM Mikhail Gryzykhin 
>>> wrote:
>>>
 Hi everyone,

 Our python pipelines failed with limit exceeded error
 
 :

 ResourceExhausted: 429 Your project has exceeded a limit: 
 (type="topics-per-project", current=1, maximum=1).


 Does anyone know if there were new tests that use topics added recently?

 I tried to see list of topics, but UI fails
 
 to load. Will see if I can use APIs to investigate.

 If anyone has good insight, please, pick up BEAM-6610
 .

 --Mikhail

 Have feedback ?

>>>


Re: Beam Python streaming pipeline on Flink Runner

2019-02-06 Thread Maximilian Michels

Thanks for your replies Robert and Cham.

What I had in mind was a generic Wrapper that would easily allow users 
to use IO from Java. Such wrapper could start as an experimental feature 
and then, through URN versioning, become stable eventually.


UDFs are needed, though they are a special case. Most users (including 
Matthias) just want to specify a few String options which do not require 
UDFs but something along the lines what I proposed here.


Robert wrote:

UDFs that are called from within an IO as part of its operation is
still an open question.


Exactly. How about we solve the easier case first, unblock users, and 
then think more about solving the general case?


Cham wrote:

I'm happy to work with you to realize this.


Would be great to exchange more ideas on this! I can compile the current 
ideas we have in a document and we move from there.


Thanks,
Max

On 05.02.19 17:56, Chamikara Jayalath wrote:



On Tue, Feb 5, 2019, 8:11 AM Maximilian Michels  wrote:


Good points Cham.

JSON seemed like the most intuitive way to specify a configuration map.
We already use JSON in other places, e.g. to specify the environment
configuration. It is not necessarily a contradiction to have JSON
inside
Protobuf. From the perspective of IO authors, the user-friendliness
plays a role because they wouldn't have to deal with Protobuf.


It's a good point that JSON will make this more user-friendly for IO 
transforms authors. Probably we should do a bit of experimentation and 
keep this experimental in case we hit a performance snag.



I agree that the configuration format is an implementation detail that
will be hidden to users via easy-to-use wrappers.

Do we have to support UDFs for expanding existing IO? Users would still
be able to apply UDFs via ParDo on the IO output collections. Generally
speaking, I can see how for cross-language transforms UDF support would
be good. For example, a Combine implementation in Java, where the
combine UDFs come from Python.


I think we should try to support UDFs in the first version unless 
there's a major blocker that hinders realizing this. Many IO transforms 
available today expect users to pass UDFs to realize various features 
(for example, dynamic destinations for BigqueryIO and FileIO, timestamp 
function for KafkaIO). I think without support for UDFs usability of 
cross-language transforms feature will be significantly limited.


I'm happy to work with you to realize this.

Thanks,
Cham


I suppose the question is, do we try to solve the general case, or
do we
go with a simpler approach for enabling the use of existing IO first?
Lack of IO seems to be the most pressing issue for the adoption of Beam
Python. I imagine that a backwards-compatible incremental support for
cross-language transforms (IOs first, later other transforms) would be
possible.

-Max

On 05.02.19 03:07, Chamikara Jayalath wrote:
 >
 >
 > On Fri, Feb 1, 2019 at 6:12 AM Maximilian Michels mailto:m...@apache.org>
 > >> wrote:
 >
 >     Yes, I imagine sources to implement a JsonConfigurable
interface (e.g.
 >     on their builders):
 >
 >     JsonConfigurable {
 >         // Either a json string or Map
 >         apply(String jsonConfig);
 >     }
 >
 >     In Python we would create this transform:
 >
 >     URN: JsonConfiguredSource:v1
 >     payload: {
 >          environment: environment_id, // Java/Python/Go
 >          resourceIdentifier: string,  //
"org.apache.beam.io.PubSubIO"
 >          configuration: json config,  // { "topic" :
"my_pubsub_topic" }
 >     }
 >
 >
 > Thanks Max, this is a great first step towards defining to API for
 > cross-language transforms.
 > Is there a reason why you would want to use JSON instead of a proto
 > here. I guess we'll be providing a more user friendly language
wrapper
 > (for example, Python) for end-users here, so
user-friendliness-wise, the
 > format we choose won't matter much (for pipeline authors).
 > If we don't support UDFs, performance difference will be
negligible, but
 > UDFs might require a callback to original SDK (per-element worst
case).
 > So might make sense to choose the more efficient format.
 >
 > Also, probably we need to define a more expanded definition
(proto/JSON)
 > to support UDFs. For example, a payload + a set of parameter
definitions
 > so that the target SDK (for example, Java) can call back the
original
 > SDK where the pipeline was authored in (for example, Python) to
resolve
 > UDFs at runtime.
 >
 > Thanks,
 > Cham
 >
 >     That's more generic and could be used for other languages
where we
 >     might
 >     have sources/sinks.
 

Re: [VOTE] Release 2.10.0, release candidate #1

2019-02-06 Thread Etienne Chauchot
Hi,
I just fixed both (one was not a bug but an error in test code) in this [1] 
PR[1] 
https://github.com/apache/beam/pull/7751
Etienne
Le mardi 05 février 2019 à 17:37 +0100, Etienne Chauchot a écrit :
> Hi guys,
> I just found 2 bugs while replacing the mock in CassandraIO by a proper 
> instance:
> https://issues.apache.org/jira/browse/BEAM-6592https://issues.apache.org/jira/browse/BEAM-6591
> I don't think they are release blockers because they have been there since 
> CassandraIO first version.One of them is
> quite tricky, IMHO I don't think we should wait for the fix before the 
> release.
> Etienne
> Le mercredi 30 janvier 2019 à 10:01 -0800, Chamikara Jayalath a écrit :
> > FYI, created another blocker: 
> > https://issues.apache.org/jira/browse/BEAM-6552
> > 
> > Thanks,
> > Cham
> > On Tue, Jan 29, 2019 at 4:38 PM Ahmet Altay  wrote:
> > > -1, I ran into a new blocking issue: 
> > > https://issues.apache.org/jira/browse/BEAM-6545
> > > On Tue, Jan 29, 2019 at 4:08 PM Kenneth Knowles  wrote:
> > > > I have done this in the least vulnerable way I can think of. I have 
> > > > filed 
> > > > https://issues.apache.org/jira/browse/BEAM-6544 as a blocker to fix the 
> > > > release process.
> > > > Kenn
> > > > On Tue, Jan 29, 2019 at 3:07 PM Kenneth Knowles  wrote:
> > > > > Yes, the instructions for building the wheels includes inputting my 
> > > > > ASF credentials into Travis-CI. I've been
> > > > > trying to understand why and what I can do instead.
> > > > > (The release guide says that the release script builds the binaries, 
> > > > > but from what I can tell it does not.
> > > > > This makes sense because the instructions are highly manual too.)
> > > > > Kenn
> > > > > On Tue, Jan 29, 2019 at 12:38 AM Robert Bradshaw 
> > > > >  wrote:
> > > > > > The artifacts and signatures look good. But we're missing Python 
> > > > > > wheels.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > On Tue, Jan 29, 2019 at 6:08 AM Kenneth Knowles  
> > > > > > wrote:
> > > > > > 
> > > > > > >
> > > > > > 
> > > > > > > Ah, I did not close the staging repository. Thanks for letting me 
> > > > > > > know. Try now.
> > > > > > 
> > > > > > >
> > > > > > 
> > > > > > > Kenn
> > > > > > 
> > > > > > >
> > > > > > 
> > > > > > > On Mon, Jan 28, 2019 at 2:31 PM Ismaël Mejía  
> > > > > > > wrote:
> > > > > > 
> > > > > > >>
> > > > > > 
> > > > > > >> I think there is an issue, [4] does not open?
> > > > > > 
> > > > > > >>
> > > > > > 
> > > > > > >> On Mon, Jan 28, 2019 at 6:24 PM Kenneth Knowles 
> > > > > > >>  wrote:
> > > > > > 
> > > > > > >> >
> > > > > > 
> > > > > > >> > Hi everyone,
> > > > > > 
> > > > > > >> >
> > > > > > 
> > > > > > >> > Please review and vote on the release candidate #1 for the 
> > > > > > >> > version 2.10.0, as follows:
> > > > > > 
> > > > > > >> >
> > > > > > 
> > > > > > >> > [ ] +1, Approve the release
> > > > > > 
> > > > > > >> > [ ] -1, Do not approve the release (please provide specific 
> > > > > > >> > comments)
> > > > > > 
> > > > > > >> >
> > > > > > 
> > > > > > >> > The complete staging area is available for your review, which 
> > > > > > >> > includes:
> > > > > > 
> > > > > > >> > * JIRA release notes [1],
> > > > > > 
> > > > > > >> > * the official Apache source release to be deployed to 
> > > > > > >> > dist.apache.org [2], which is signed with the
> > > > > > key with fingerprint 6ED551A8AE02461C [3],
> > > > > > 
> > > > > > >> > * all artifacts to be deployed to the Maven Central Repository 
> > > > > > >> > [4],
> > > > > > 
> > > > > > >> > * source code tag "v2.10.0-RC1" [5],
> > > > > > 
> > > > > > >> > * website pull request listing the release [6] and publishing 
> > > > > > >> > the API reference manual [7].
> > > > > > 
> > > > > > >> > * Python artifacts are deployed along with the source release 
> > > > > > >> > to the dist.apache.org [2].
> > > > > > 
> > > > > > >> > * Validation sheet with a tab for 2.10.0 release to help with 
> > > > > > >> > validation [7].
> > > > > > 
> > > > > > >> >
> > > > > > 
> > > > > > >> > The vote will be open for at least 72 hours. It is adopted by 
> > > > > > >> > majority approval, with at least 3 PMC
> > > > > > affirmative votes.
> > > > > > 
> > > > > > >> >
> > > > > > 
> > > > > > >> > Thanks,
> > > > > > 
> > > > > > >> > Kenn
> > > > > > 
> > > > > > >> >
> > > > > > 
> > > > > > >> > [1] 
> > > > > > >> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344540
> > > > > > 
> > > > > > >> > [2] https://dist.apache.org/repos/dist/dev/beam/2.10.0/
> > > > > > 
> > > > > > >> > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> > > > > > 
> > > > > > >> > [4] 
> > > > > > >> > https://repository.apache.org/content/repositories/orgapachebeam-1056/
> > > > > > 
> > > > > > >> > [5] https://github.com/apache/beam/tree/v2.10.0-RC1
> > > > > > 
> > > > > > >> > [6] https://github.com/apache/beam/pull/7651/files
> > > > > > 
> > > > > > >> > 

Re: [DISCUSSION] UTests and embedded backends

2019-02-06 Thread Etienne Chauchot
Hi guys,  I just submitted the PR: https://github.com/apache/beam/pull/7751. It 
contains  refactorings, tests
improvements/fixes and production code fixing.
I wanted to give a little feedback because replacing the mock by a real 
instance allowed to - improve the tests: fix bad
tests- add missing split test - and more important to discover a bug in the 
production code of the split and fix it.
=> So I would love if we all agree to avoid mocks when possible.  Of course, as 
mentioned, some times mocks cannot be
avoided e.g. for hosted backends.
Etienne
Le lundi 28 janvier 2019 à 11:16 +0100, Etienne Chauchot a écrit :
> Guys,
> I will try using mocks where I see it is needed. As there is a current PR 
> opened on Cassandra, I will take this
> opportunity to add the embedded cassandra server 
> (https://github.com/jsevellec/cassandra-unit) to the UTests.Ticket
> was opened while ago: https://issues.apache.org/jira/browse/BEAM-4164
> Etienne
> Le mardi 22 janvier 2019 à 09:26 +0100, Robert Bradshaw a écrit :
> > On Mon, Jan 21, 2019 at 10:42 PM Kenneth Knowles  wrote:
> > Robert - you meant this as a mostly-automatic thing that we would engineer, 
> > yes?
> > Yes, something like TestPipeline that buffers up the pipelines andthen 
> > executes on class teardown (details TBD).
> > A lighter-weight fake, like using something in-process sharing a Java 
> > interface (versus today a locally running
> > service sharing an RPC interface) is still much better than a mock.
> > +1
> > 
> > Kenn
> > On Mon, Jan 21, 2019 at 7:17 AM Jean-Baptiste Onofré  
> > wrote:
> > Hi,
> > it makes sense to use embedded backend when:
> > 1. it's possible to easily embed the backend2. when the backend is 
> > "predictable".
> > If it's easy to embed and the backend behavior is predictable, then itmakes 
> > sense.In other cases, we can fallback to
> > mock.
> > RegardsJB
> > On 21/01/2019 10:07, Etienne Chauchot wrote:Hi guys,
> > Lately I have been fixing various Elasticsearch flakiness issues in 
> > theUTests by: introducing timeouts, countdown
> > latches, force refresh,embedded cluster size decrease ...
> > These flakiness issues are due to the embedded Elasticsearch not copingwell 
> > with the jenkins overload. Still, IMHO I
> > believe that havingembedded backend for UTests are a lot better than mocks. 
> > Even if theyare less tolerant to load, I
> > prefer having UTests 100% representative ofreal backend and add 
> > countermeasures to protect against jenkins overload.
> > WDYT ?
> > Etienne
> > 
> > 
> > --Jean-Baptiste Onofréjbonofre@apache.orghttp://blog.nanthrax.netTalend - 
> > http://www.talend.com


Re: Resource usage exceeded: topics-per-project

2019-02-06 Thread Andrew Pilloud
SQL doesn't cleanup pubsub subscriptions. Feel free to delete those.

Andrew

On Wed, Feb 6, 2019, 8:01 AM Mikhail Gryzykhin  wrote:

> +Kenneth Knowles  you're working on SQL recently, so
> might provide some info.
>
> I see a lot of topics of format
> rojects/apache-beam-testing/topics/integ-test-PubsubJsonIT-testSQLLimit-2018-08-27-16-55-44-342-start-7392789257486934721
> .
> Seems we do not cleanup properly.
>
> Is it safe to cleanup topics with this name?
>
> --Mikhail
>
> Have feedback ?
>
>
> On Wed, Feb 6, 2019 at 7:46 AM Mikhail Gryzykhin 
> wrote:
>
>> Minor UPD:
>> As expected it fails most of our test jobs, since we use Pub/Subs in many
>> tests.
>>
>> --Mikhail
>>
>> Have feedback ?
>>
>>
>> On Wed, Feb 6, 2019 at 7:38 AM Mikhail Gryzykhin 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Our python pipelines failed with limit exceeded error
>>> 
>>> :
>>>
>>> ResourceExhausted: 429 Your project has exceeded a limit: 
>>> (type="topics-per-project", current=1, maximum=1).
>>>
>>>
>>> Does anyone know if there were new tests that use topics added recently?
>>>
>>> I tried to see list of topics, but UI fails
>>> 
>>> to load. Will see if I can use APIs to investigate.
>>>
>>> If anyone has good insight, please, pick up BEAM-6610
>>> .
>>>
>>> --Mikhail
>>>
>>> Have feedback ?
>>>
>>


Re: Resource usage exceeded: topics-per-project

2019-02-06 Thread Mikhail Gryzykhin
+Kenneth Knowles  you're working on SQL recently, so might
provide some info.

I see a lot of topics of format
rojects/apache-beam-testing/topics/integ-test-PubsubJsonIT-testSQLLimit-2018-08-27-16-55-44-342-start-7392789257486934721
.
Seems we do not cleanup properly.

Is it safe to cleanup topics with this name?

--Mikhail

Have feedback ?


On Wed, Feb 6, 2019 at 7:46 AM Mikhail Gryzykhin  wrote:

> Minor UPD:
> As expected it fails most of our test jobs, since we use Pub/Subs in many
> tests.
>
> --Mikhail
>
> Have feedback ?
>
>
> On Wed, Feb 6, 2019 at 7:38 AM Mikhail Gryzykhin 
> wrote:
>
>> Hi everyone,
>>
>> Our python pipelines failed with limit exceeded error
>> 
>> :
>>
>> ResourceExhausted: 429 Your project has exceeded a limit: 
>> (type="topics-per-project", current=1, maximum=1).
>>
>>
>> Does anyone know if there were new tests that use topics added recently?
>>
>> I tried to see list of topics, but UI fails
>> 
>> to load. Will see if I can use APIs to investigate.
>>
>> If anyone has good insight, please, pick up BEAM-6610
>> .
>>
>> --Mikhail
>>
>> Have feedback ?
>>
>


Re: Resource usage exceeded: topics-per-project

2019-02-06 Thread Mikhail Gryzykhin
Minor UPD:
As expected it fails most of our test jobs, since we use Pub/Subs in many
tests.

--Mikhail

Have feedback ?


On Wed, Feb 6, 2019 at 7:38 AM Mikhail Gryzykhin  wrote:

> Hi everyone,
>
> Our python pipelines failed with limit exceeded error
> 
> :
>
> ResourceExhausted: 429 Your project has exceeded a limit: 
> (type="topics-per-project", current=1, maximum=1).
>
>
> Does anyone know if there were new tests that use topics added recently?
>
> I tried to see list of topics, but UI fails
> 
> to load. Will see if I can use APIs to investigate.
>
> If anyone has good insight, please, pick up BEAM-6610
> .
>
> --Mikhail
>
> Have feedback ?
>


Resource usage exceeded: topics-per-project

2019-02-06 Thread Mikhail Gryzykhin
Hi everyone,

Our python pipelines failed with limit exceeded error
:

ResourceExhausted: 429 Your project has exceeded a limit:
(type="topics-per-project", current=1, maximum=1).


Does anyone know if there were new tests that use topics added recently?

I tried to see list of topics, but UI fails

to load. Will see if I can use APIs to investigate.

If anyone has good insight, please, pick up BEAM-6610
.

--Mikhail

Have feedback ?


Re: [VOTE] Release 2.10.0, release candidate #2

2019-02-06 Thread Maximilian Michels

- Ran Flink WordCount with Quickstart guide
- Ran release testing scripts for Flink

Discovered a regression: https://jira.apache.org/jira/browse/BEAM-6608

If there is another blocker for the release, I'd would like to fix this 
for RC3. PR is already out.


Thanks,
Max

On 06.02.19 11:24, Robert Bradshaw wrote:

+1.

I verified the source artifacts look good, and tried the Python wheels.

On Tue, Feb 5, 2019 at 11:57 PM Kenneth Knowles  wrote:


Hi everyone,

Please review and vote on the release candidate #2 for the version 2.10.0, as 
follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2], 
which is signed with the key with fingerprint 6ED551A8AE02461C [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.10.0-RC1" [5],
* website pull request listing the release [6] and publishing the API reference 
manual [7].
* Python artifacts are deployed along with the source release to the 
dist.apache.org [2].
* Validation sheet with a tab for 2.10.0 release to help with validation [7].

The vote will be open for at least 72 hours. It is adopted by majority 
approval, with at least 3 PMC affirmative votes.

Thanks,
Kenn

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344540
[2] https://dist.apache.org/repos/dist/dev/beam/2.10.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1057/
[5] https://github.com/apache/beam/tree/v2.10.0-RC2
[6] https://github.com/apache/beam/pull/7651/files
[7] https://github.com/apache/beam-site/pull/586
[8] 
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529


Re: [VOTE] Release 2.10.0, release candidate #2

2019-02-06 Thread Etienne Chauchot
Hi,I checked Nexmark on both output size (functional regression detection) and 
run time (performance regression). The
only thing I see is a performance regression on query7 (side input + fanout) in 
spark runner but this regression is
there since the previous release cut.Indeed 2.9 was cut on 18/12/06 and the 
perf regression started on 18/10/05. I don't
think it is a blocker, then.
Also I see this ticket tagged as blocker: 
https://issues.apache.org/jira/browse/BEAM-3261 it is a very old ticket.
Should we target it for later on ?
Etienne
Le mercredi 06 février 2019 à 11:26 +0100, Jean-Baptiste Onofré a écrit :
> +1 (binding)
> Quickly tested on beam-samples.
> RegardsJB
> On 05/02/2019 23:57, Kenneth Knowles wrote:
> Hi everyone,
> Please review and vote on the release candidate #2 for theversion 2.10.0, as 
> follows:
> [ ] +1, Approve the release[ ] -1, Do not approve the release (please provide 
> specific comments)
> The complete staging area is available for your review, which includes:* JIRA 
> release notes [1],* the official Apache
> source release to be deployed to dist.apache.orgt; 
> [2], which is signed with the key
> withfingerprint 6ED551A8AE02461C [3],* all artifacts to be deployed to the 
> Maven Central Repository [4],* source code
> tag "v2.10.0-RC1" [5],* website pull request listing the release [6] and 
> publishing the APIreference manual [7].*
> Python artifacts are deployed along with the source release tothe 
> dist.apache.org  [2].*
> Validation sheet with a tab for 2.10.0 release to help with validation[7].
> The vote will be open for at least 72 hours. It is adopted by 
> majorityapproval, with at least 3 PMC affirmative votes.
> Thanks,Kenn
> [1] 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344540[2]
>  https://dist.apache.
> org/repos/dist/dev/beam/2.10.0/[3] 
> https://dist.apache.org/repos/dist/release/beam/KEYS dist/release/beam/KEYS>[4] 
> https://repository.apache.org/content/repositories/orgapachebeam-
> 1057/[5] https://github.com/apache/beam/tree/v2.10.0-
> RC2[6] https://github.com/apache/beam/pull/7651/files[7] 
> https://github.com/apache/beam-
> site/pull/586[8] 
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-
> oLFo_ZXBpJw/edit#gid=2053422529
> 


Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-02-06 Thread Jean-Baptiste Onofré
+1

Thanks for that Ismaël.

Regards
JB

On 06/02/2019 11:24, Ismaël Mejía wrote:
> Since it seems we have consensus on deprecating both transforms I created
> 
> BEAM-6605 Deprecate TextIO.readAll() and TextIO.ReadAll transform
> BEAM-6606 Deprecate AvroIO.readAll() and AvroIO.ReadAll transform
> 
> Thanks everyone.
> 
> On Fri, Feb 1, 2019 at 7:03 PM Chamikara Jayalath  
> wrote:
>>
>> Python SDK doesn't have FileIO yet so let's keep ReadAllFromFoo transforms 
>> currently available for various file types around till we have that.
>>
>> Thanks,
>> Cham
>>
>> On Fri, Feb 1, 2019 at 7:41 AM Jean-Baptiste Onofré  
>> wrote:
>>>
>>> Hi,
>>>
>>> readFiles() should be used IMHO. We should remove readAll() to avoid
>>> confusion.
>>>
>>> Regards
>>> JB
>>>
>>> On 30/01/2019 17:25, Ismaël Mejía wrote:
 Hello,

 A ‘recent’ pattern of use in Beam is to have in file based IOs a
 `readAll()` implementation that basically matches a `PCollection` of
 file patterns and reads them, e.g. `TextIO`, `AvroIO`. `ReadAll` is
 implemented by a expand function that matches files with FileIO and
 then reads them using a format specific `ReadFiles` transform e.g.
 TextIO.ReadFiles, AvroIO.ReadFiles. So in the end `ReadAll` in the
 Java implementation is just an user friendly API to hide FileIO.match
 + ReadFiles.

 Most recent IOs do NOT implement ReadAll to encourage the more
 composable approach of File + ReadFiles, e.g. XmlIO and ParquetIO.

 Implementing ReadAll as a wrapper is relatively easy and is definitely
 user friendly, but it has an  issue, it may be error-prone and it adds
 more code to maintain (mostly ‘repeated’ code). However `readAll` is a
 more abstract pattern that applies not only to File based IOs so it
 makes sense for example in other transforms that map a `Pcollection`
 of read requests and is the basis for SDF composable style APIs like
 the recent `HBaseIO.readAll()`.

 So the question is should we:

 [1] Implement `readAll` in all file based IOs to be user friendly and
 assume the (minor) maintenance cost

 or

 [2] Deprecate `readAll` from file based IOs and encourage users to use
 FileIO + `readFiles` (less maintenance and encourage composition).

 I just checked quickly in the python code base but I did not find if
 the File match + ReadFiles pattern applies, but it would be nice to
 see what the python guys think on this too.

 This discussion comes from a recent slack conversation with Łukasz
 Gajowy, and we wanted to settle into one approach to make the IO
 signatures consistent, so any opinions/preferences?

>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [VOTE] Release 2.10.0, release candidate #2

2019-02-06 Thread Jean-Baptiste Onofré
+1 (binding)

Quickly tested on beam-samples.

Regards
JB

On 05/02/2019 23:57, Kenneth Knowles wrote:
> Hi everyone,
> 
> Please review and vote on the release candidate #2 for the
> version 2.10.0, as follows:
> 
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
>  [2], which is signed with the key with
> fingerprint 6ED551A8AE02461C [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.10.0-RC1" [5],
> * website pull request listing the release [6] and publishing the API
> reference manual [7].
> * Python artifacts are deployed along with the source release to
> the dist.apache.org  [2].
> * Validation sheet with a tab for 2.10.0 release to help with validation
> [7].
> 
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
> 
> Thanks,
> Kenn
> 
> [1] 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344540
> [2] https://dist.apache.org/repos/dist/dev/beam/2.10.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> 
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1057/
> [5] https://github.com/apache/beam/tree/v2.10.0-RC2
> [6] https://github.com/apache/beam/pull/7651/files
> [7] https://github.com/apache/beam-site/pull/586
> [8] 
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-02-06 Thread Ismaël Mejía
Since it seems we have consensus on deprecating both transforms I created

BEAM-6605 Deprecate TextIO.readAll() and TextIO.ReadAll transform
BEAM-6606 Deprecate AvroIO.readAll() and AvroIO.ReadAll transform

Thanks everyone.

On Fri, Feb 1, 2019 at 7:03 PM Chamikara Jayalath  wrote:
>
> Python SDK doesn't have FileIO yet so let's keep ReadAllFromFoo transforms 
> currently available for various file types around till we have that.
>
> Thanks,
> Cham
>
> On Fri, Feb 1, 2019 at 7:41 AM Jean-Baptiste Onofré  wrote:
>>
>> Hi,
>>
>> readFiles() should be used IMHO. We should remove readAll() to avoid
>> confusion.
>>
>> Regards
>> JB
>>
>> On 30/01/2019 17:25, Ismaël Mejía wrote:
>> > Hello,
>> >
>> > A ‘recent’ pattern of use in Beam is to have in file based IOs a
>> > `readAll()` implementation that basically matches a `PCollection` of
>> > file patterns and reads them, e.g. `TextIO`, `AvroIO`. `ReadAll` is
>> > implemented by a expand function that matches files with FileIO and
>> > then reads them using a format specific `ReadFiles` transform e.g.
>> > TextIO.ReadFiles, AvroIO.ReadFiles. So in the end `ReadAll` in the
>> > Java implementation is just an user friendly API to hide FileIO.match
>> > + ReadFiles.
>> >
>> > Most recent IOs do NOT implement ReadAll to encourage the more
>> > composable approach of File + ReadFiles, e.g. XmlIO and ParquetIO.
>> >
>> > Implementing ReadAll as a wrapper is relatively easy and is definitely
>> > user friendly, but it has an  issue, it may be error-prone and it adds
>> > more code to maintain (mostly ‘repeated’ code). However `readAll` is a
>> > more abstract pattern that applies not only to File based IOs so it
>> > makes sense for example in other transforms that map a `Pcollection`
>> > of read requests and is the basis for SDF composable style APIs like
>> > the recent `HBaseIO.readAll()`.
>> >
>> > So the question is should we:
>> >
>> > [1] Implement `readAll` in all file based IOs to be user friendly and
>> > assume the (minor) maintenance cost
>> >
>> > or
>> >
>> > [2] Deprecate `readAll` from file based IOs and encourage users to use
>> > FileIO + `readFiles` (less maintenance and encourage composition).
>> >
>> > I just checked quickly in the python code base but I did not find if
>> > the File match + ReadFiles pattern applies, but it would be nice to
>> > see what the python guys think on this too.
>> >
>> > This discussion comes from a recent slack conversation with Łukasz
>> > Gajowy, and we wanted to settle into one approach to make the IO
>> > signatures consistent, so any opinions/preferences?
>> >
>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com


Re: [VOTE] Release 2.10.0, release candidate #2

2019-02-06 Thread Robert Bradshaw
+1.

I verified the source artifacts look good, and tried the Python wheels.

On Tue, Feb 5, 2019 at 11:57 PM Kenneth Knowles  wrote:
>
> Hi everyone,
>
> Please review and vote on the release candidate #2 for the version 2.10.0, as 
> follows:
>
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org [2], 
> which is signed with the key with fingerprint 6ED551A8AE02461C [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.10.0-RC1" [5],
> * website pull request listing the release [6] and publishing the API 
> reference manual [7].
> * Python artifacts are deployed along with the source release to the 
> dist.apache.org [2].
> * Validation sheet with a tab for 2.10.0 release to help with validation [7].
>
> The vote will be open for at least 72 hours. It is adopted by majority 
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Kenn
>
> [1] 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344540
> [2] https://dist.apache.org/repos/dist/dev/beam/2.10.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1057/
> [5] https://github.com/apache/beam/tree/v2.10.0-RC2
> [6] https://github.com/apache/beam/pull/7651/files
> [7] https://github.com/apache/beam-site/pull/586
> [8] 
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529