[VOTE] Release 2.7.0, release candidate #2

2018-09-24 Thread Charles Chen
Hi everyone,

Please review and vote on the release candidate #2 for the version 2.7.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint 45C60AAAD115F560 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.7.0-RC2" [5],
* website pull request listing the release and publishing the API reference
manual [6].
* Java artifacts were built with Gradle 4.8 and OpenJDK
1.8.0_181-8u181-b13-1~deb9u1.
* Python artifacts are deployed along with the source release to the
dist.apache.org [2].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Charles

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12343654
[2] https://dist.apache.org/repos/dist/dev/beam/2.7.0
[3] https://dist.apache.org/repos/dist/dev/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1047/
[5] https://github.com/apache/beam/tree/v2.7.0-RC2
[6] https://github.com/apache/beam-site/pull/549


Re: [jira] [Commented] (BEAM-5468) Allow runner to set worker log level in Python SDK harness.

2018-09-24 Thread Thomas Weise
I did not find a related option in the Java SDK. Logging in the Java based
runners is typically configured at the JVM level.

For the SDK harness, since the harness process will be per pipeline, it
would be possible to use a pipeline option also.

The runner could provide a default behavior in any case, by setting a
pipeline option (if not provided by the user) or an environment variable. I
think that by default the harness log level should follow the runner log
level.

>From a user perspective, it might be confusing to have different ways to
control logging for runner and sdk harness?


On Mon, Sep 24, 2018 at 5:58 AM Robert Bradshaw  wrote:

> I think it may already be a pipeline option on the Java side. I'd rather
> minimize the number of environment variables we are using to control
> behavior, but I haven't looked at how hard it is to plumb this through.
>
> On Mon, Sep 24, 2018 at 2:21 PM Thomas Weise (JIRA) 
> wrote:
>
>>
>> [
>> https://issues.apache.org/jira/browse/BEAM-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625743#comment-16625743
>> ]
>>
>> Thomas Weise commented on BEAM-5468:
>> 
>>
>> I was also planning to look at this. My idea was to add another
>> environment variable so this can be controlled from the job bundle factory
>> / environment manager, and it can follow the logging level on the runner
>> side.  Do you think this should be a pipeline option instead?
>>
>> > Allow runner to set worker log level in Python SDK harness.
>> > ---
>> >
>> > Key: BEAM-5468
>> > URL: https://issues.apache.org/jira/browse/BEAM-5468
>> > Project: Beam
>> >  Issue Type: Improvement
>> >  Components: sdk-py-harness
>> >Reporter: Robert Bradshaw
>> >Assignee: Robert Bradshaw
>> >Priority: Major
>> >
>>
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v7.6.3#76005)
>>
>


Re: Cleanup Jenkins old jobs

2018-09-24 Thread Thomas Weise
+1

On Mon, Sep 24, 2018 at 3:27 PM Yifan Zou  wrote:

> +1 on this. There are also some jobs for test/experiment purpose which are
> stale. Good to remove them that makes the UI neat.
>
>
> On Mon, Sep 24, 2018 at 2:56 PM Andrew Pilloud 
> wrote:
>
>> This sounds good to me. We have a significant number of jobs have been
>> renamed. Having both the old and new jobs in the UI makes the correct job
>> more difficult to find.
>>
>> Andrew
>>
>> On Mon, Sep 24, 2018 at 2:52 PM Ankur Goenka  wrote:
>>
>>> Hi,
>>>
>>> Jenkins UI has accumulated a lot of old jobs over time which are not in
>>> use any more.
>>> Shall we clean old jobs (Jobs which did not run in last 7 days) from the
>>> jenkins UI for a cleaner view of valid jobs?
>>> This is a low risk cleanup as Seed Job will recreate valid jobs if one
>>> gets removed.
>>>
>>>
>>> Thanks,
>>> Ankur
>>>
>>


Re: Cleanup Jenkins old jobs

2018-09-24 Thread Yifan Zou
+1 on this. There are also some jobs for test/experiment purpose which are
stale. Good to remove them that makes the UI neat.


On Mon, Sep 24, 2018 at 2:56 PM Andrew Pilloud  wrote:

> This sounds good to me. We have a significant number of jobs have been
> renamed. Having both the old and new jobs in the UI makes the correct job
> more difficult to find.
>
> Andrew
>
> On Mon, Sep 24, 2018 at 2:52 PM Ankur Goenka  wrote:
>
>> Hi,
>>
>> Jenkins UI has accumulated a lot of old jobs over time which are not in
>> use any more.
>> Shall we clean old jobs (Jobs which did not run in last 7 days) from the
>> jenkins UI for a cleaner view of valid jobs?
>> This is a low risk cleanup as Seed Job will recreate valid jobs if one
>> gets removed.
>>
>>
>> Thanks,
>> Ankur
>>
>


Re: Cleanup Jenkins old jobs

2018-09-24 Thread Andrew Pilloud
This sounds good to me. We have a significant number of jobs have been
renamed. Having both the old and new jobs in the UI makes the correct job
more difficult to find.

Andrew

On Mon, Sep 24, 2018 at 2:52 PM Ankur Goenka  wrote:

> Hi,
>
> Jenkins UI has accumulated a lot of old jobs over time which are not in
> use any more.
> Shall we clean old jobs (Jobs which did not run in last 7 days) from the
> jenkins UI for a cleaner view of valid jobs?
> This is a low risk cleanup as Seed Job will recreate valid jobs if one
> gets removed.
>
>
> Thanks,
> Ankur
>


Cleanup Jenkins old jobs

2018-09-24 Thread Ankur Goenka
Hi,

Jenkins UI has accumulated a lot of old jobs over time which are not in use
any more.
Shall we clean old jobs (Jobs which did not run in last 7 days) from the
jenkins UI for a cleaner view of valid jobs?
This is a low risk cleanup as Seed Job will recreate valid jobs if one gets
removed.


Thanks,
Ankur


Re: Removing documentation for old Beam versions

2018-09-24 Thread Scott Wegner
> We could add a new default branch (master?) and keep all the
non-generated files (src/) there, and put generated files (content/) in the
asf-site branch (like we already do).

I'm strongly in favor of having sources in a single repository. We have
significant process and infrastructure built up for the apache/beam repo
(for build, PR, CI, release, etc.) that we can take advantage of by putting
website sources in the same repo. The current beam-site repo PR automation
is flaky because it was custom-built and not given the same level of
attention as the main repo.

The caveat to consolidating website sources in the main repo is that it
incentivizes putting the generated sources branch on the same repo. I've
documented a few of the reasons in the Appendix of the design doc [1]:
 - It's easier to maintain a single repository; easily apply existing
tooling/infrastructure
- Jenkins tooling for publishing generated HTML may not work cross-repo [2]

My preference is to move forward with the migration of sources to
apache/beam [master], and website generated HTML to apache/beam [asf-site].
I like the idea of separating the publishing/hosting of generated
javadocs/pydocs since they add so much cruft, but it should not hold up the
migration.

[1]
https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.wqwi2jpoiiuc

[2]
https://stackoverflow.com/questions/14843696/checkout-multiple-git-repos-into-same-jenkins-workspace

On Mon, Sep 24, 2018 at 2:33 PM Udi Meiri  wrote:

> Staying on beam-site SGTM. We could add a new default branch (master?) and
> keep all the non-generated files (src/) there, and put generated files
> (content/) in the asf-site branch (like we already do).
> That way there's no confusion as to which files you should update.
> (This is of course assuming we still place generated docs in git repos.)
>
> On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise  wrote:
>
>> My thought was to leave the asf-site branch in the beam-site repository,
>> add generated docs to that branch (until we have a better solution), and
>> have only sources in the beam repo.
>>
>> Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 -
>> it would eliminate the need to place generated docs into git repos.
>>
>> On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri  wrote:
>>
>>> I believe that beam.apache.org is populated from the asf-site branch of
>>> the apache/beam-site repo. (gitpubsub:
>>> https://www.apache.org/dev/project-site.html#intro)
>>> If we move the markdown-based docs to apache/beam, leave generated
>>> javadoc and pydoc in apache/beam-site, and point gitpubsub to apache/beam,
>>> then javadoc and pydoc will not get pushed to the website.
>>>
>>> Is there some place where we can push javadoc and pydoc files? Or
>>> perhaps there an alternative way to push updates to beam.apache.org?
>>> (not requiring the asf-site branch)
>>>
>>> On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise  wrote:
>>>
 Hi Scott,

 Thanks for bringing the discussion back here.

 I agree that we should separate the changes for hosting of generated
 java/pydocs from the rest of website automation so that we can make the
 switch and fix the contributor headache soon.

 But perhaps we can avoid adding 4m lines of generated code to the main
 beam repository (and keep on adding with every release) if we continue to
 serve the site from the old beam-site repo? (I left a comment the doc.)

 About trying buildbot, as mentioned earlier I would be happy to help
 with it. I prefer a setup that keeps the docs separate from the web site.

 Thomas


 On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner  wrote:

> Re-opening this thread as it came up today in the discussion for
> PR#6458 [1]. This PR is part of the work for Beam-Site Automation
> Reliability improvements; design doc here:
> https://s.apache.org/beam-site-automation
>
> The current plan is to keep generated javadoc/pydoc sources only on
> the asf-site branch, which is necessary for the current githubpubsub
> publishing mechanism. This maintains our current approach, the only change
> being that we're moving the asf-site branch from the retiring
> apache/beam-site repository into a new apache/beam repo branch.
>
> The concern for committing generated content is the extra overhead
> during git fetch. I did some analysis to measure the impact [2], and found
> that fetching a week of source + generated content history from
> apache/beam-site took 0.39 seconds.
>
> I like the idea of publishing javadoc/pydoc snapshots to an external
> location like Flink does with buildbot, but that work is separable and
> shouldn't be a prerequisite for this effort. The goal of this work is to
> improve the reliability of automation for contributing website changes. At
> last measure, only about half of beam-site PR merges 

Re: Removing documentation for old Beam versions

2018-09-24 Thread Udi Meiri
Staying on beam-site SGTM. We could add a new default branch (master?) and
keep all the non-generated files (src/) there, and put generated files
(content/) in the asf-site branch (like we already do).
That way there's no confusion as to which files you should update.
(This is of course assuming we still place generated docs in git repos.)

On Mon, Sep 24, 2018 at 11:23 AM Thomas Weise  wrote:

> My thought was to leave the asf-site branch in the beam-site repository,
> add generated docs to that branch (until we have a better solution), and
> have only sources in the beam repo.
>
> Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 -
> it would eliminate the need to place generated docs into git repos.
>
> On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri  wrote:
>
>> I believe that beam.apache.org is populated from the asf-site branch of
>> the apache/beam-site repo. (gitpubsub:
>> https://www.apache.org/dev/project-site.html#intro)
>> If we move the markdown-based docs to apache/beam, leave generated
>> javadoc and pydoc in apache/beam-site, and point gitpubsub to apache/beam,
>> then javadoc and pydoc will not get pushed to the website.
>>
>> Is there some place where we can push javadoc and pydoc files? Or perhaps
>> there an alternative way to push updates to beam.apache.org? (not
>> requiring the asf-site branch)
>>
>> On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise  wrote:
>>
>>> Hi Scott,
>>>
>>> Thanks for bringing the discussion back here.
>>>
>>> I agree that we should separate the changes for hosting of generated
>>> java/pydocs from the rest of website automation so that we can make the
>>> switch and fix the contributor headache soon.
>>>
>>> But perhaps we can avoid adding 4m lines of generated code to the main
>>> beam repository (and keep on adding with every release) if we continue to
>>> serve the site from the old beam-site repo? (I left a comment the doc.)
>>>
>>> About trying buildbot, as mentioned earlier I would be happy to help
>>> with it. I prefer a setup that keeps the docs separate from the web site.
>>>
>>> Thomas
>>>
>>>
>>> On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner  wrote:
>>>
 Re-opening this thread as it came up today in the discussion for
 PR#6458 [1]. This PR is part of the work for Beam-Site Automation
 Reliability improvements; design doc here:
 https://s.apache.org/beam-site-automation

 The current plan is to keep generated javadoc/pydoc sources only on the
 asf-site branch, which is necessary for the current githubpubsub publishing
 mechanism. This maintains our current approach, the only change being that
 we're moving the asf-site branch from the retiring apache/beam-site
 repository into a new apache/beam repo branch.

 The concern for committing generated content is the extra overhead
 during git fetch. I did some analysis to measure the impact [2], and found
 that fetching a week of source + generated content history from
 apache/beam-site took 0.39 seconds.

 I like the idea of publishing javadoc/pydoc snapshots to an external
 location like Flink does with buildbot, but that work is separable and
 shouldn't be a prerequisite for this effort. The goal of this work is to
 improve the reliability of automation for contributing website changes. At
 last measure, only about half of beam-site PR merges use Mergebot without
 experiencing some reliability issue [3].

 I've opened BEAM-5459 [4] to track moving our generated docs out of
 git. Thomas, would you have bandwidth to look into this?

 [1] https://github.com/apache/beam/pull/6458#issuecomment-423406643
 [2]
 https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j
 [3]
 https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu
 [4] https://issues.apache.org/jira/browse/BEAM-5459

 On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise  wrote:

> Hi Udi,
>
> Good to know you will continue this work.
>
> Let me know if you want to try the buildbot route (which does not
> require generated documentation to be checked into the repo). Happy to 
> help
> with that.
>
> Thomas
>
> On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri  wrote:
>
>> I'm picking up the website migration. The plan is to not include
>> generated files in the master branch.
>>
>> However, I've been told that even putting generated files a separate
>> branch could blow up the git repository for all (e.g. make git pulls a 
>> lot
>> longer?).
>> Not sure if this is a real issue or not.
>>
>> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw 
>> wrote:
>>
>>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise  wrote:
>>> >
>>> > Yes, I think the separation of generated code will need to occur
>>> prior to 

Re: Removing documentation for old Beam versions

2018-09-24 Thread Thomas Weise
My thought was to leave the asf-site branch in the beam-site repository,
add generated docs to that branch (until we have a better solution), and
have only sources in the beam repo.

Scott had filed https://issues.apache.org/jira/browse/BEAM-5459 - it would
eliminate the need to place generated docs into git repos.

On Mon, Sep 24, 2018 at 11:06 AM Udi Meiri  wrote:

> I believe that beam.apache.org is populated from the asf-site branch of
> the apache/beam-site repo. (gitpubsub:
> https://www.apache.org/dev/project-site.html#intro)
> If we move the markdown-based docs to apache/beam, leave generated javadoc
> and pydoc in apache/beam-site, and point gitpubsub to apache/beam, then
> javadoc and pydoc will not get pushed to the website.
>
> Is there some place where we can push javadoc and pydoc files? Or perhaps
> there an alternative way to push updates to beam.apache.org? (not
> requiring the asf-site branch)
>
> On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise  wrote:
>
>> Hi Scott,
>>
>> Thanks for bringing the discussion back here.
>>
>> I agree that we should separate the changes for hosting of generated
>> java/pydocs from the rest of website automation so that we can make the
>> switch and fix the contributor headache soon.
>>
>> But perhaps we can avoid adding 4m lines of generated code to the main
>> beam repository (and keep on adding with every release) if we continue to
>> serve the site from the old beam-site repo? (I left a comment the doc.)
>>
>> About trying buildbot, as mentioned earlier I would be happy to help with
>> it. I prefer a setup that keeps the docs separate from the web site.
>>
>> Thomas
>>
>>
>> On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner  wrote:
>>
>>> Re-opening this thread as it came up today in the discussion for PR#6458
>>> [1]. This PR is part of the work for Beam-Site Automation Reliability
>>> improvements; design doc here: https://s.apache.org/beam-site-automation
>>>
>>> The current plan is to keep generated javadoc/pydoc sources only on the
>>> asf-site branch, which is necessary for the current githubpubsub publishing
>>> mechanism. This maintains our current approach, the only change being that
>>> we're moving the asf-site branch from the retiring apache/beam-site
>>> repository into a new apache/beam repo branch.
>>>
>>> The concern for committing generated content is the extra overhead
>>> during git fetch. I did some analysis to measure the impact [2], and found
>>> that fetching a week of source + generated content history from
>>> apache/beam-site took 0.39 seconds.
>>>
>>> I like the idea of publishing javadoc/pydoc snapshots to an external
>>> location like Flink does with buildbot, but that work is separable and
>>> shouldn't be a prerequisite for this effort. The goal of this work is to
>>> improve the reliability of automation for contributing website changes. At
>>> last measure, only about half of beam-site PR merges use Mergebot without
>>> experiencing some reliability issue [3].
>>>
>>> I've opened BEAM-5459 [4] to track moving our generated docs out of git.
>>> Thomas, would you have bandwidth to look into this?
>>>
>>> [1] https://github.com/apache/beam/pull/6458#issuecomment-423406643
>>> [2]
>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j
>>> [3]
>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu
>>> [4] https://issues.apache.org/jira/browse/BEAM-5459
>>>
>>> On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise  wrote:
>>>
 Hi Udi,

 Good to know you will continue this work.

 Let me know if you want to try the buildbot route (which does not
 require generated documentation to be checked into the repo). Happy to help
 with that.

 Thomas

 On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri  wrote:

> I'm picking up the website migration. The plan is to not include
> generated files in the master branch.
>
> However, I've been told that even putting generated files a separate
> branch could blow up the git repository for all (e.g. make git pulls a lot
> longer?).
> Not sure if this is a real issue or not.
>
> On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw 
> wrote:
>
>> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise  wrote:
>> >
>> > Yes, I think the separation of generated code will need to occur
>> prior to completing the merge and switching the web site to the main 
>> repo.
>> >
>> > There should be no reason to check generated documentation into
>> either of the repos/branches.
>>
>> Huge +1 to this. Thomas, would have time to set something like this up
>> for Beam? If not, could anyone else pick this up?
>>
>> > Please see as an example how this was solved in Flink, using the
>> ASF buildbot infrastructure.
>> >
>> > Documentation per version/release, for example:
>> >
>> 

Re: Removing documentation for old Beam versions

2018-09-24 Thread Udi Meiri
I believe that beam.apache.org is populated from the asf-site branch of the
apache/beam-site repo. (gitpubsub:
https://www.apache.org/dev/project-site.html#intro)
If we move the markdown-based docs to apache/beam, leave generated javadoc
and pydoc in apache/beam-site, and point gitpubsub to apache/beam, then
javadoc and pydoc will not get pushed to the website.

Is there some place where we can push javadoc and pydoc files? Or perhaps
there an alternative way to push updates to beam.apache.org? (not requiring
the asf-site branch)

On Fri, Sep 21, 2018 at 6:40 PM Thomas Weise  wrote:

> Hi Scott,
>
> Thanks for bringing the discussion back here.
>
> I agree that we should separate the changes for hosting of generated
> java/pydocs from the rest of website automation so that we can make the
> switch and fix the contributor headache soon.
>
> But perhaps we can avoid adding 4m lines of generated code to the main
> beam repository (and keep on adding with every release) if we continue to
> serve the site from the old beam-site repo? (I left a comment the doc.)
>
> About trying buildbot, as mentioned earlier I would be happy to help with
> it. I prefer a setup that keeps the docs separate from the web site.
>
> Thomas
>
>
> On Fri, Sep 21, 2018 at 10:28 AM Scott Wegner  wrote:
>
>> Re-opening this thread as it came up today in the discussion for PR#6458
>> [1]. This PR is part of the work for Beam-Site Automation Reliability
>> improvements; design doc here: https://s.apache.org/beam-site-automation
>>
>> The current plan is to keep generated javadoc/pydoc sources only on the
>> asf-site branch, which is necessary for the current githubpubsub publishing
>> mechanism. This maintains our current approach, the only change being that
>> we're moving the asf-site branch from the retiring apache/beam-site
>> repository into a new apache/beam repo branch.
>>
>> The concern for committing generated content is the extra overhead during
>> git fetch. I did some analysis to measure the impact [2], and found that
>> fetching a week of source + generated content history from apache/beam-site
>> took 0.39 seconds.
>>
>> I like the idea of publishing javadoc/pydoc snapshots to an external
>> location like Flink does with buildbot, but that work is separable and
>> shouldn't be a prerequisite for this effort. The goal of this work is to
>> improve the reliability of automation for contributing website changes. At
>> last measure, only about half of beam-site PR merges use Mergebot without
>> experiencing some reliability issue [3].
>>
>> I've opened BEAM-5459 [4] to track moving our generated docs out of git.
>> Thomas, would you have bandwidth to look into this?
>>
>> [1] https://github.com/apache/beam/pull/6458#issuecomment-423406643
>> [2]
>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.uqzivheohd7j
>> [3]
>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#heading=h.a208cwi78xmu
>> [4] https://issues.apache.org/jira/browse/BEAM-5459
>>
>> On Fri, Aug 24, 2018 at 11:48 AM Thomas Weise  wrote:
>>
>>> Hi Udi,
>>>
>>> Good to know you will continue this work.
>>>
>>> Let me know if you want to try the buildbot route (which does not
>>> require generated documentation to be checked into the repo). Happy to help
>>> with that.
>>>
>>> Thomas
>>>
>>> On Fri, Aug 24, 2018 at 11:36 AM Udi Meiri  wrote:
>>>
 I'm picking up the website migration. The plan is to not include
 generated files in the master branch.

 However, I've been told that even putting generated files a separate
 branch could blow up the git repository for all (e.g. make git pulls a lot
 longer?).
 Not sure if this is a real issue or not.

 On Mon, Aug 20, 2018 at 2:53 AM Robert Bradshaw 
 wrote:

> On Sun, Aug 5, 2018 at 5:28 AM Thomas Weise  wrote:
> >
> > Yes, I think the separation of generated code will need to occur
> prior to completing the merge and switching the web site to the main repo.
> >
> > There should be no reason to check generated documentation into
> either of the repos/branches.
>
> Huge +1 to this. Thomas, would have time to set something like this up
> for Beam? If not, could anyone else pick this up?
>
> > Please see as an example how this was solved in Flink, using the ASF
> buildbot infrastructure.
> >
> > Documentation per version/release, for example:
> >
> > https://ci.apache.org/projects/flink/flink-docs-release-1.5/
> >
> > The buildbot configuration is here (requires committer access):
> >
> >
> https://svn.apache.org/repos/infra/infrastructure/buildbot/aegis/buildmaster/master1/projects/flink.conf
> >
> > Thanks,
> > Thomas
> >
> > On Thu, Aug 2, 2018 at 6:46 PM Mikhail Gryzykhin 
> wrote:
> >>
> >> Last time I talked with Scott I brought this idea in. I believe the
> plan was 

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-24 Thread Nithin Sujir
Hi,
Do we have an ETA on the 2.7.0 release?

Thanks,
Nithin.


On Fri, Sep 21, 2018 at 12:10 AM Romain Manni-Bucau 
wrote:

> Hi Charles,
>
> I didnt get a chance to work more on it but the sample shows that just
> changing beam version breaks existing code.
>
> Since Beam does not manage its dependency compatibility with runners -
> understand this as "it never managed this issue" - I guess you can proceed
> with 2.7 ignoring this breaking change. For > 2.7.0 versions, testing with
> the officially supported engine versions (kind of matrix compatibility) can
> be required with some advanced apps (with sides, unions, etc... maybe
> nextmark can be a start?).
>
> My blind guess is that 2.6 was compiled with spark 2.2.1 and 2.7 with
> spark 2.3.1 and therefore the method can have changed even if signature
> didn't (thanks scala and java method lookup which uses returned types vs
> signatures ignoring them).
> An interesting test would be to compile Beam 2.7.0 with spark 2.2.1 and
> run it with my project, I guess it would work.
>
> Side note: during my tests i realized that if you use avro 1.8 new API it
> fails in spark since only spark master was upgraded to avro 1.7 and not
> earlier versions so beam providing avro 1.8 is another issue.
>
> Anyway, fine to proceed on my side even if there is a "user regression",
> nobody being available to identify it would mean delaying the release of
> too much and beam is far to be only spark runner so no reason to block
> others ;).
>
> Le ven. 21 sept. 2018 03:32, Ahmet Altay  a écrit :
>
>> Good point. However, we agreed that our release policy would be to patch
>> only long term support (LTS) releases. Given that we have not made any LTS
>> releases yet, perhaps we should use 2.8.0 as the opportunity to make our
>> first LTS release.
>>
>> On Thu, Sep 20, 2018 at 6:26 PM, Thomas Weise  wrote:
>>
>>> That's not the same for a user though. 2.7.1 would be a patch compatible
>>> release that only fixes bugs. 2.8.0 adds new features and potentially also
>>> new issues..
>>>
>>> On Thu, Sep 20, 2018 at 3:16 PM Ahmet Altay  wrote:
>>>
 +1 to Thomas's suggestion. Instead of 2.7.1 we can follow up with 2.8.0
 though. 2.8.0 has a release branch cut date of 10/10 according to our
 release calendar.

 On Thu, Sep 20, 2018 at 2:47 PM, Connell O'Callaghan <
 conne...@google.com> wrote:

> +1 to Thomas's suggestion - if Charles or others cannot reproduce.
>
> On Thu, Sep 20, 2018 at 2:40 PM Thomas Weise  wrote:
>
>> We can also consider releasing 2.7.0 and then follow up with 2.7.1 if
>> the problem can be reproduced and requires a fix. Just food for thought 
>> :)
>>
>>
>> On Thu, Sep 20, 2018 at 2:13 PM Charles Chen  wrote:
>>
>>> My mistake, it looks like the correct beam staging repository (
>>> https://repository.apache.org/content/repositories/orgapachebeam-1046/)
>>> is specified in your pom file.
>>>
>>> On Thu, Sep 20, 2018 at 2:10 PM Charles Chen  wrote:
>>>
 Hey Romain and JB, do you have any progress on this?  One thing I
 would like to point out is that 2.7.0 isn't yet pushed to Maven 
 Central, so
 referring to it by version is not expected to work (and it looks like 
 this
 is what is done in your repo:
 https://github.com/rmannibucau/beam-2.7.0-fails).  Luke indicated
 above that he doesn't see any dependency changes.  Can you isolate and
 reproduce this problem so that we can develop a fix, if necessary?  I 
 would
 like to proceed with an RC2 as soon as possible.

 On Wed, Sep 19, 2018 at 6:37 AM Romain Manni-Bucau <
 rmannibu...@gmail.com> wrote:

> Quick update on the spark issue: I didn't get enough time to
> identify it clearly but managed to have a passing run of my test 
> changing a
> bunch of versions.
> I suspect my code triggers some class conflicting between spark
> and my shade leading to a serialization issue. I didn't test
> userClassPathFirst option of spark but it can be an interesting thing 
> to
> enable in beam runner.
> However it is still very confusing to have it not running just
> upgrading beam version and the spark error is very hard to understand.
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>
>
> Le mar. 18 sept. 2018 à 20:17, Lukasz Cwik  a
> écrit :
>
>> Romain hinted that this was a dependency issue but when comparing

Re: Python SDK: .options deprecation

2018-09-24 Thread Udi Meiri
Sindy, I don't believe that pipeline.options() works, it's the deprecated
@property method.

Ahmet, the first paragraph of this doc seems to have some background about
the general direction: https://s.apache.org/no-beam-pipeline


On Fri, Sep 21, 2018 at 6:17 PM Ahmet Altay  wrote:

> If I remember correctly, this was related to change of the signature of
> run on runners to run(pipeline, options) instead of just run(pipeline). A
> runner accepts a pipeline and set of options to run that pipeline, instead
> of a pipeline and options being a combined thing.
>
> It would be good to update the comments or bug with some explanation.
>
> On Fri, Sep 21, 2018 at 2:20 PM, Sindy Li  wrote:
>
>> I think it's just deprecating referencing the option as
>>pipeline.options
>> , because it is made into a private variable, but we can still do
>>pipeline.options()
>>
>> On Fri, Sep 21, 2018 at 2:11 PM Udi Meiri  wrote:
>>
>>> Hey, does anybody know why the pipeline.options property was deprecated?
>>> I found this bug: https://issues.apache.org/jira/browse/BEAM-2124
>>> but there's no explanation.
>>>
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Resolving Go SDK build/test failures when using gradle

2018-09-24 Thread Lukasz Cwik
Are we setting all the intra-project dependencies[1] within each of our Go
based build.gradle files?

I ask because typically I would suspect that the build system would attempt
to get the dependency from its dependency management section if it wasn't
declared within.

1:
https://github.com/apache/beam/blob/a5bc2cbf07eb46d0af208190a2d828b96421fdab/sdks/go/test/build.gradle#L33

On Fri, Sep 21, 2018 at 4:24 PM Robert Burke  wrote:

> If you haven't run into :beam-sdks-go:buildLinuxAmd64 or similar failing
> with "undefined: passert.Sum" recently, stop reading now.
>
> The root cause is that the gogradle plugin doesn't clean up the vendor
> directories  that it
> sets up, in combination with trying to use gradle with Go, leads to gradle
> vendoring the beam package in the sdks/go/test, sdks/go/examples,
> sdks/go/container directories.
> This vendoring is persistent on the local client, and isn't cleaned up by
> the clean task.
>
> The immediate fix is to navigate to each of the sdks/go, and
> sdks/go/{test|examples|container} directories and delete the vendor and
> .gogradle directories.
>
> eg Run the following from your beam git root, if your'e using a *Unix or
> similar:
>
> rm -rf sdks/go/{vendor,.gogradle}
> sdks/go/{test,examples,container}/{vendor,.gogradle}
>
> Then try running your gradle command again.
>
> Go gradle will create the same vendored directories, but will at least
> have a more up to date version of the Go SDK.
>
> The short term fix would be to fix the recursive clean tasks that affect
> the go builds to also remove the vendor directories, if not targetting the
> vendored copy of beam specifically.
>
> This situation is awful and while the long term fix would probably be to
> use Go Modules  directly (see
> BEAM-5379 ), and replace
> gogradle plugin with targeted bash scripts, allowing the go tool to manage
> dependencies and build artifacts directly.
>
> If you're familiar with gradle and could provide guidance on customizing a
> task, so it's properly invoked on the general clean task, it would be much
> appreciated. BEAM-5465  has
> been filed to track that work.
>
> Thank you for your patience,
> Robert B
>


Re: [jira] [Commented] (BEAM-5468) Allow runner to set worker log level in Python SDK harness.

2018-09-24 Thread Robert Bradshaw
I think it may already be a pipeline option on the Java side. I'd rather
minimize the number of environment variables we are using to control
behavior, but I haven't looked at how hard it is to plumb this through.

On Mon, Sep 24, 2018 at 2:21 PM Thomas Weise (JIRA)  wrote:

>
> [
> https://issues.apache.org/jira/browse/BEAM-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625743#comment-16625743
> ]
>
> Thomas Weise commented on BEAM-5468:
> 
>
> I was also planning to look at this. My idea was to add another
> environment variable so this can be controlled from the job bundle factory
> / environment manager, and it can follow the logging level on the runner
> side.  Do you think this should be a pipeline option instead?
>
> > Allow runner to set worker log level in Python SDK harness.
> > ---
> >
> > Key: BEAM-5468
> > URL: https://issues.apache.org/jira/browse/BEAM-5468
> > Project: Beam
> >  Issue Type: Improvement
> >  Components: sdk-py-harness
> >Reporter: Robert Bradshaw
> >Assignee: Robert Bradshaw
> >Priority: Major
> >
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>


Beam Dependency Check Report (2018-09-24)

2018-09-24 Thread Apache Jenkins Server

High Priority Dependency Updates Of Beam Python SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
google-cloud-bigquery
0.25.0
1.5.1
2017-06-26
2017-06-26BEAM-5469
google-cloud-pubsub
0.26.0
0.38.0
2017-06-26
2017-06-26BEAM-5398
High Priority Dependency Updates Of Beam Java SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
org.assertj:assertj-core
2.5.0
3.11.1
2016-07-03
2018-08-28BEAM-5289
com.google.auto.service:auto-service
1.0-rc2
1.0-rc4
2014-10-25
2017-12-11BEAM-4874
biz.aQute:bndlib
1.43.0
2.0.0.20130123-133441
2011-04-01
2013-02-27BEAM-4884
com.gradle:build-scan-plugin
1.13.1
1.16
2018-04-10
2018-08-27BEAM-5224
org.apache.cassandra:cassandra-all
3.9
3.11.3
2016-09-26
2018-07-25BEAM-5083
commons-cli:commons-cli
1.2
1.4
2009-03-19
2017-03-09BEAM-4896
commons-codec:commons-codec
1.9
1.11
2013-12-21
2017-10-17BEAM-4898
org.apache.commons:commons-dbcp2
2.1.1
2.5.0
2015-08-03
2018-07-13BEAM-4900
com.typesafe:config
1.3.0
1.3.3
2015-05-08
2018-02-21BEAM-4902
de.flapdoodle.embed:de.flapdoodle.embed.mongo
1.50.1
2.1.1
2015-12-11
2018-06-21BEAM-4904
de.flapdoodle.embed:de.flapdoodle.embed.process
1.50.1
2.0.5
2015-12-10
2018-06-21BEAM-4905
org.elasticsearch:elasticsearch-hadoop
5.0.0
6.4.1
2016-10-26
2018-09-13BEAM-5470
net.ltgt.gradle:gradle-apt-plugin
0.13
0.18
2017-11-01
2018-07-23BEAM-4924
com.google.code.gson:gson
2.7
2.8.5
2016-06-14
2018-05-22BEAM-4947
com.google.guava:guava
20.0
26.0-jre
2016-10-28
2018-08-01BEAM-5085
org.apache.hbase:hbase-common
1.2.6
2.1.0
2017-05-29
2018-07-10BEAM-4951
org.apache.hbase:hbase-hadoop-compat
1.2.6
2.1.0
2017-05-29
2018-07-10BEAM-4952
org.apache.hbase:hbase-hadoop2-compat
1.2.6
2.1.0
2017-05-29
2018-07-10BEAM-4953
org.apache.hbase:hbase-server
1.2.6
2.1.0
2017-05-29
2018-07-10BEAM-4954
org.apache.hbase:hbase-shaded-client
1.2.6
2.1.0
2017-05-29
2018-07-10BEAM-4955
org.apache.hbase:hbase-shaded-server
1.2.6
2.0.0-alpha2
2017-05-29
2017-08-16BEAM-4956
org.apache.hive:hive-cli
2.1.0
3.1.0.3.0.1.0-187
2016-06-16
2018-09-24BEAM-5471
org.apache.hive:hive-common
2.1.0
3.1.0.3.0.1.0-187
2016-06-16
2018-09-24BEAM-5472
org.apache.hive:hive-exec
2.1.0
3.1.0.3.0.1.0-187
2016-06-16
2018-09-24BEAM-5473
org.apache.hive.hcatalog:hive-hcatalog-core
2.1.0
3.1.0.3.0.1.0-187
2016-06-16
2018-09-24BEAM-5474
net.java.dev.javacc:javacc
4.0
7.0.4
2006-03-17
2018-09-17BEAM-5475
org.slf4j:jcl-over-slf4j
1.7.25
1.8.0-beta2
2017-03-16
2018-03-21BEAM-5234
net.java.dev.jna:jna
4.1.0
4.5.2
2014-03-06
2018-07-12BEAM-4973
com.esotericsoftware.kryo:kryo
2.21
2.24.0
2013-02-27
2014-05-04BEAM-4975
org.apache.kudu:kudu-client
1.4.0
1.7.1
2017-06-05
2018-05-30BEAM-5087
io.dropwizard.metrics:metrics-core
3.1.2
4.1.0-rc2
2015-04-26
2018-05-03BEAM-4977
org.mongodb:mongo-java-driver
3.2.2
3.8.2
2016-02-15
2018-09-19BEAM-5476
io.netty:netty-all
4.1.17.Final
5.0.0.Alpha2
2017-11-08
2015-03-03BEAM-4981
io.opencensus:opencensus-api
0.12.3
0.16.1
2018-04-13
2018-09-18BEAM-5477
io.opencensus:opencensus-contrib-grpc-metrics
0.12.3
0.16.1
2018-04-13
2018-09-18BEAM-5478
org.apache.qpid:proton-j
0.13.1
0.29.0
2016-07-02
2018-08-10BEAM-5153
com.carrotsearch.randomizedtesting:randomizedtesting-runner
2.5.2
2.7.0
2017-07-04