Re: [DISCUSS] Remove JavaBeamZetaSQL and JavaBeamZetaSQLJava11 PreCommits

2020-07-28 Thread Kenneth Knowles
Yes, it seems like we probably have redundancy. Based on
https://github.com/apache/beam/pull/9210#discussion_r315899335 it sounds
like you added the ZetaSQL precommit when the SQL precommit was run as part
of the Java precommit. Since then, the SQL precommit was separated from
Java. It makes sense to me to have all the SQL tests be one precommit.

I have done some looking to see if we can exclude the SQL tests from the
basic Java precommit (which calls "buildDependents") but I have not figured
it out.

Kenn

On Tue, Jul 28, 2020 at 4:13 PM Rui Wang  wrote:

> Hi community,
>
> I observed that when there is a BeamSQL related PR, there will be four
> precommits: SQL precommit, JavaBeamZetaSQL Precommit, and their Java11
> version. It turns out that SQL precommit will run tests from the Beam
> ZetaSQL module.
>
> Thus we can remove JavaBeamZetaSQL and JavaBeamZetaSQLJava11 precommits
> while still keeping the same level of testing on BeamSQL PRs.
>
> Do you agree? Does anyone know what is the right order of operations to
> remove a precommit from Beam?
>
>
>
> -Rui
>


[DISCUSS] Remove JavaBeamZetaSQL and JavaBeamZetaSQLJava11 PreCommits

2020-07-28 Thread Rui Wang
Hi community,

I observed that when there is a BeamSQL related PR, there will be four
precommits: SQL precommit, JavaBeamZetaSQL Precommit, and their Java11
version. It turns out that SQL precommit will run tests from the Beam
ZetaSQL module.

Thus we can remove JavaBeamZetaSQL and JavaBeamZetaSQLJava11 precommits
while still keeping the same level of testing on BeamSQL PRs.

Do you agree? Does anyone know what is the right order of operations to
remove a precommit from Beam?



-Rui


Re: No space left on device - beam-jenkins 1 and 7

2020-07-28 Thread Kenneth Knowles
Cool. If it is /home/jenkins it should be just fine. Thanks for checking!

Kenn

On Tue, Jul 28, 2020 at 10:23 AM Damian Gadomski <
damian.gadom...@polidea.com> wrote:

> Sorry, mistake while copying, [1] should be:
> [1]
> https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L63
>
>
> On Tue, Jul 28, 2020 at 7:21 PM Damian Gadomski <
> damian.gadom...@polidea.com> wrote:
>
>> That's interesting. I didn't check that myself but all the Jenkins jobs
>> are configured to wipe the workspace just before the actual build happens
>> [1]
>> .
>> Git SCM plugin is used for that and it enables the option called "Wipe out
>> repository and force clone". Docs state that it "deletes the contents of
>> the workspace before build and before checkout" [2]
>> . Therefore I assume that removing
>> workspace just after the build won't change anything.
>>
>> The ./.gradle/caches/modules-2/files-2.1 dir is indeed present on the
>> worker machines but it's rather in /home/jenkins dir.
>>
>> damgad@apache-ci-beam-jenkins-13:/home/jenkins/.gradle$ sudo du -sh
>> 11G .
>> damgad@apache-ci-beam-jenkins-13:/home/jenkins/.gradle$ sudo du -sh
>> caches/modules-2/files-2.1
>> 2.3G caches/modules-2/files-2.1
>>
>> I can't find that directory structure inside workspaces.
>>
>> damgad@apache-ci-beam-jenkins-13:/home/jenkins/jenkins-slave/workspace$
>> sudo find -name "files-2.1"
>> damgad@apache-ci-beam-jenkins-13:/home/jenkins/jenkins-slave/workspace$
>>
>> [1]
>> https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L6
>> [2] https://plugins.jenkins.io/git/
>>
>> On Tue, Jul 28, 2020 at 5:47 PM Kenneth Knowles  wrote:
>>
>>> Just checking - will this wipe out dependency cache? That will slow
>>> things down and significantly increase flakiness. If I recall correctly,
>>> the default Jenkins layout was:
>>>
>>> /home/jenkins/jenkins-slave/workspace/$jobname
>>> /home/jenkins/jenkins-slave/workspace/$jobname/.m2
>>> /home/jenkins/jenkins-slave/workspace/$jobname/.git
>>>
>>> Where you can see that it did a `git clone` right into the root
>>> workspace directory, adjacent to .m2. This was not hygienic. One important
>>> thing was that `git clean` would wipe the maven cache with every build. So
>>> in https://github.com/apache/beam/pull/3976 we changed it to
>>>
>>> /home/jenkins/jenkins-slave/workspace/$jobname
>>> /home/jenkins/jenkins-slave/workspace/$jobname/.m2
>>> /home/jenkins/jenkins-slave/workspace/$jobname/src/.git
>>>
>>> Now the .m2 directory survives and we do not constantly see flakes
>>> re-downloading deps that are immutable. This does, of course, use disk
>>> space.
>>>
>>> That was in the maven days. Gradle is the same except for $HOME/.m2 is
>>> replaced by $HOME/.gradle/caches/modules-2/files-2.1. Is Jenkins configured
>>> the same way so we will be wiping out the dependencies? If so, can you
>>> address this issue? Everything in that directory should be immutable and
>>> just a cache to avoid pointless re-download.
>>>
>>> Kenn
>>>
>>> On Tue, Jul 28, 2020 at 2:25 AM Damian Gadomski <
>>> damian.gadom...@polidea.com> wrote:
>>>
 Agree with Udi, workspaces seem to be the third culprit, not yet
 addressed in any way (until PR#12326
  is merged). I feel that
 it'll solve the issue of filling up the disks for a long time ;)

 I'm also OK with moving /tmp cleanup to option B, and will happily
 investigate on proper TMPDIR config.



 On Tue, Jul 28, 2020 at 3:07 AM Udi Meiri  wrote:

> What about the workspaces, which can take up 175GB in some cases (see
> above)?
> I'm working on getting them cleaned up automatically:
> https://github.com/apache/beam/pull/12326
>
> My opinion is that we would get more mileage out of fixing the jobs
> that leave behind files in /tmp and images/containers in Docker.
> This would also help keep development machines clean.
>
>
> On Mon, Jul 27, 2020 at 5:31 PM Tyson Hamilton 
> wrote:
>
>> Here is a summery of how I understand things,
>>
>>   - /tmp and /var/lib/docker are the culprit for filling up disks
>>   - inventory Jenkins job runs every 12 hours and runs a docker prune
>> to clean up images older than 24hr
>>   - crontab on each machine cleans up /tmp files older than three
>> days weekly
>>
>> This doesn't seem to be working since we're still running out of disk
>> periodically and requiring manual intervention. Knobs and options we have
>> available:
>>
>>   1. increase frequency of deleting files
>>   2. decrease the number of days required to delete a file (e.g.

Re: Versioning published Java containers

2020-07-28 Thread Brian Hulette
Agreed that it makes sense to publish containers built at HEAD - I filed
BEAM-10593 [1] to track that work.

[1] https://issues.apache.org/jira/browse/BEAM-10593

On Wed, Jul 15, 2020 at 12:31 PM Kenneth Knowles  wrote:

> It makes sense to me that the snapshot should be everything needed for a
> release. Definitely containers fit that.
>
> Kenn
>
> On Wed, Jul 15, 2020 at 11:37 AM Chamikara Jayalath 
> wrote:
>
>>
>>
>> On Wed, Jul 15, 2020 at 11:17 AM Kyle Weaver  wrote:
>>
>>> Thanks everyone for the details. Seems like Java 11 support is farther
>>> along than I had imagined :)
>>>
>>> > Is there any progress into getting
>>> > back, any ticket people can follow if interested?
>>>
>>> https://issues.apache.org/jira/browse/BEAM-10049
>>>
>>> > I understand that a user can publish their own versions of HEAD
>>> containers but this does not work well when developing automated tests for
>>> distributed runners.
>>>
>>> Why not?
>>>
>>
>> I would say the benefits of having regularly published HEAD containers
>> will be similar to the benefits of having daily Beam SNAPSHOT jars
>> published.
>> For example,
>> (1) This will give a common container that all Beam Jenkins tests can
>> refer to when running jobs for distributed runners, for example when
>> running Dataflow jobs
>> (2) This will allow users to easily check fixes to HEAD
>> (3) This will allow users to easily run additional automated tests on
>> Beam HEAD (for example, Google internal tests)
>>
>> For example, we recently started using published Java containers for
>> Dataflow cross-language pipelines. But running the same tests on HEAD
>> requires additional setup.
>>
>> Thanks,
>> Cham
>>
>>
>>>
>>> On Wed, Jul 15, 2020 at 9:25 AM Chamikara Jayalath 
>>> wrote:
>>>
 Can we consider regularly publishing HEAD containers as well (for
 example, we publish SNAPSHOT jars daily) ? I understand that a user can
 publish their own versions of HEAD containers but this does not work well
 when developing automated tests for distributed runners. Apologies if this
 was discussed before.

 Thanks,
 Cham

 On Wed, Jul 15, 2020 at 12:43 AM Ismaël Mejía 
 wrote:

> Thanks Robert for the explanation. Is there any progress into getting
> back, any ticket people can follow if interested?
>
> On Wed, Jul 15, 2020 at 12:13 AM Robert Burke 
> wrote:
> >
> > Disallowing the go containers was largely due to not having a simple
> check on the go boot code's licenses which is required for containers
> hosted under the main Apache namespace.
> >
> >  A manual verification reveals it's only either Go's standard
> library BSD license and GRPCs Apache v2 licenses. Not impossible but not
> yet done by us. The JIRA issue has a link to the appropriate license 
> finder
> for go packages.
> >
> > The amusing bit is that very similar Go boot code is included in the
> Java and Python containers too, so we're only accidentally in compliance
> with that there, if at all.
> >
> >
> >
> > On Tue, Jul 14, 2020, 2:22 PM Ismaël Mejía 
> wrote:
> >>
> >> +1 for naming as python containers, and quick release so users can
> try it.
> >>
> >> Not related to this tnread but I am also curious about the reasons
> to remove the
> >> go docker images, was this discussed/voted in the ML (maybe I
> missed it) ?
> >>
> >> I don't think Beam has been historically a conservative project
> about releasing
> >> early in-progress versions and I have learnt to appreciate this
> because it helps
> >> for early user testing and bug reports which will be definitely a
> must for Java
> >> 11.
> >>
> >> We should read the ticket Kyle mentions with a grain of salt. Most
> of the
> >> sub-tasks in that ticket are NOT about allowing users to run
> pipelines with Java
> >> 11 but about been able to fully build and run the tests and the
> source code
> >> ofBeam with Java 11 which is a different goal (important but
> probably less for
> >> end users) and a task with lots of extra issues because of plugins
> / dependent
> >> systems etc.
> >>
> >> For the Java 11 harness what we need is to guarantee is that users
> can run their
> >> code without issues with Java 11 and we can do this now for example
> by checking
> >> that portable runners that support Java 11 pass ValidatesRunner
> with the Java 11
> >> harness. Since some classic runners [1] already pass these tests,
> it should be
> >> relatively 'easy' to do so for portable runners.
> >>
> >> [1]
> https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/
> >>
> >>
> >>
> >>
> >> On Sat, Jul 11, 2020 at 12:43 AM Ahmet Altay 
> wrote:
> >> >
> >> > Related to the naming question, +1 and this will be 

[RELEASE VOTE RESULT] Release 2.23.0, candidate #2

2020-07-28 Thread Valentyn Tymofieiev
I'm happy to announce that we have approved the 2.23.0 release.

There are 6 approving votes, 3 of which are binding:
* Ahmet Altay
* Robert Bradshaw
* Pablo Estrada

Thanks everyone for your help to prepare the release.

I'm going to finalize the release and send out the official release
announcement
it is available.


Re: [VOTE] Release 2.23.0, release candidate #2

2020-07-28 Thread Pablo Estrada
+1 (binding)
I've ran Java quickstarts on Flink+Dataflow+DirectRunner
I've ran python quickstart on Dataflow+DirectRunner+DataflowV2.

Best
-P.

On Tue, Jul 28, 2020 at 12:03 PM Valentyn Tymofieiev 
wrote:

> To approve the release we would need one more PMC vote.
>
> On Tue, Jul 28, 2020 at 11:59 AM Valentyn Tymofieiev 
> wrote:
>
>> +1.
>>
>> I have verified that Postcommit and ValidatesRunner suites passed on the
>> release branch, checked quickstarts for core runners (local execution only
>> + Dataflow runner) for Java and Python, checked mobile gaming examples on
>> Direct and Dataflow runners, checked that the Docker images contain
>> licenses, and checked that container images released by Dataflow Runner
>> have dependencies that match requirements for apache-beam[gcp].
>>
>> I have discovered that Apache Beam on Python 2.7 with GCP requirements
>> does not install cleanly and gives a warning:
>>
>> ERROR: google-auth 1.19.2 has requirement rsa<4.1; python_version < "3",
>> but you'll have rsa 4.5 which is incompatible.
>>
>> We do not directly depend on google-auth or rsa. google-auth maintainers
>> have merged a fix 7 days ago[1] and the problem should fix itself with the
>> next release of google-auth (ETA: today). The error itself is benign.
>>
>> [1]
>> https://github.com/googleapis/google-auth-library-python/commit/6dd2597bd63be6719a0b088de21ef7e48d9d1884
>>
>>
>> On Tue, Jul 28, 2020 at 11:53 AM Kyle Weaver  wrote:
>>
>>> +1
>>> Ran Python wordcount {2.7, 3.7} x {Spark, Flink 1.10} job server
>>> containers.
>>>
>>> On Thu, Jul 23, 2020 at 10:42 AM Robert Bradshaw 
>>> wrote:
>>>
 +1 (binding)

 I validated the hashes and signatures of all the release artifacts, and
 that the source tarball matches github
 at 5df6e7949629799c46d227171281364144149f5d. I also verified that the diff
 from the last RC contains what is expected and ran some basic pipelines
 against a Python 3 wheel in a fresh virtual environment. All looks good.

 On Thu, Jul 23, 2020 at 9:47 AM Chamikara Jayalath <
 chamik...@google.com> wrote:

> +1 (non-binding)
> Tested Java quickstart and an x-lang pipeline.
>
> Thanks,
> Cham
>
> On Wed, Jul 22, 2020 at 7:34 PM Ahmet Altay  wrote:
>
>> +1 - I validated py3 quickstarts.
>>
>> On Wed, Jul 22, 2020 at 6:21 PM Valentyn Tymofieiev <
>> valen...@google.com> wrote:
>>
>>> Hi everyone,
>>>
>>> Please review and vote on the release candidate #2 for the version
>>> 2.23.0, as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>>
>>> The complete staging area is available for your review, which
>>> includes:
>>> * JIRA release notes [1],
>>> * the official Apache source release to be deployed to
>>> dist.apache.org [2], which is signed with the key with fingerprint
>>> 1DF50603225D29A4 [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.23.0-RС2" [5],
>>> * website pull request listing the release [6], publishing the API
>>> reference manual [7], and the blog post [8].
>>> * Java artifacts were built with Maven 3.6.0 and Oracle JDK
>>> 1.8.0_201-b09 .
>>> * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2].
>>> * Validation sheet with a tab for 2.23.0 release to help with
>>> validation [9].
>>> * Docker images published to Docker Hub [10].
>>>
>>> The vote will be open for at least 72 hours. It is adopted by
>>> majority approval, with at least 3 PMC affirmative votes.
>>>
>>> Thanks,
>>> Release manager.
>>>
>>> [1]
>>> https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12347145
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.23.0/
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1106/
>>> [5] https://github.com/apache/beam/tree/v2.23.0-RC2
>>> [6] https://github.com/apache/beam/pull/12212
>>> [7] https://github.com/apache/beam-site/pull/605
>>> [8] https://github.com/apache/beam/pull/12213
>>> [9]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=596347973
>>> [10] https://hub.docker.com/search?q=apache%2Fbeam=image
>>>
>>


Re: [VOTE] Release 2.23.0, release candidate #2

2020-07-28 Thread Valentyn Tymofieiev
To approve the release we would need one more PMC vote.

On Tue, Jul 28, 2020 at 11:59 AM Valentyn Tymofieiev 
wrote:

> +1.
>
> I have verified that Postcommit and ValidatesRunner suites passed on the
> release branch, checked quickstarts for core runners (local execution only
> + Dataflow runner) for Java and Python, checked mobile gaming examples on
> Direct and Dataflow runners, checked that the Docker images contain
> licenses, and checked that container images released by Dataflow Runner
> have dependencies that match requirements for apache-beam[gcp].
>
> I have discovered that Apache Beam on Python 2.7 with GCP requirements
> does not install cleanly and gives a warning:
>
> ERROR: google-auth 1.19.2 has requirement rsa<4.1; python_version < "3",
> but you'll have rsa 4.5 which is incompatible.
>
> We do not directly depend on google-auth or rsa. google-auth maintainers
> have merged a fix 7 days ago[1] and the problem should fix itself with the
> next release of google-auth (ETA: today). The error itself is benign.
>
> [1]
> https://github.com/googleapis/google-auth-library-python/commit/6dd2597bd63be6719a0b088de21ef7e48d9d1884
>
>
> On Tue, Jul 28, 2020 at 11:53 AM Kyle Weaver  wrote:
>
>> +1
>> Ran Python wordcount {2.7, 3.7} x {Spark, Flink 1.10} job server
>> containers.
>>
>> On Thu, Jul 23, 2020 at 10:42 AM Robert Bradshaw 
>> wrote:
>>
>>> +1 (binding)
>>>
>>> I validated the hashes and signatures of all the release artifacts, and
>>> that the source tarball matches github
>>> at 5df6e7949629799c46d227171281364144149f5d. I also verified that the diff
>>> from the last RC contains what is expected and ran some basic pipelines
>>> against a Python 3 wheel in a fresh virtual environment. All looks good.
>>>
>>> On Thu, Jul 23, 2020 at 9:47 AM Chamikara Jayalath 
>>> wrote:
>>>
 +1 (non-binding)
 Tested Java quickstart and an x-lang pipeline.

 Thanks,
 Cham

 On Wed, Jul 22, 2020 at 7:34 PM Ahmet Altay  wrote:

> +1 - I validated py3 quickstarts.
>
> On Wed, Jul 22, 2020 at 6:21 PM Valentyn Tymofieiev <
> valen...@google.com> wrote:
>
>> Hi everyone,
>>
>> Please review and vote on the release candidate #2 for the version
>> 2.23.0, as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>>
>> The complete staging area is available for your review, which
>> includes:
>> * JIRA release notes [1],
>> * the official Apache source release to be deployed to
>> dist.apache.org [2], which is signed with the key with fingerprint
>> 1DF50603225D29A4 [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.23.0-RС2" [5],
>> * website pull request listing the release [6], publishing the API
>> reference manual [7], and the blog post [8].
>> * Java artifacts were built with Maven 3.6.0 and Oracle JDK
>> 1.8.0_201-b09 .
>> * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2].
>> * Validation sheet with a tab for 2.23.0 release to help with
>> validation [9].
>> * Docker images published to Docker Hub [10].
>>
>> The vote will be open for at least 72 hours. It is adopted by
>> majority approval, with at least 3 PMC affirmative votes.
>>
>> Thanks,
>> Release manager.
>>
>> [1]
>> https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12347145
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.23.0/
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1106/
>> [5] https://github.com/apache/beam/tree/v2.23.0-RC2
>> [6] https://github.com/apache/beam/pull/12212
>> [7] https://github.com/apache/beam-site/pull/605
>> [8] https://github.com/apache/beam/pull/12213
>> [9]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=596347973
>> [10] https://hub.docker.com/search?q=apache%2Fbeam=image
>>
>


Re: [VOTE] Release 2.23.0, release candidate #2

2020-07-28 Thread Valentyn Tymofieiev
+1.

I have verified that Postcommit and ValidatesRunner suites passed on the
release branch, checked quickstarts for core runners (local execution only
+ Dataflow runner) for Java and Python, checked mobile gaming examples on
Direct and Dataflow runners, checked that the Docker images contain
licenses, and checked that container images released by Dataflow Runner
have dependencies that match requirements for apache-beam[gcp].

I have discovered that Apache Beam on Python 2.7 with GCP requirements does
not install cleanly and gives a warning:

ERROR: google-auth 1.19.2 has requirement rsa<4.1; python_version < "3",
but you'll have rsa 4.5 which is incompatible.

We do not directly depend on google-auth or rsa. google-auth maintainers
have merged a fix 7 days ago[1] and the problem should fix itself with the
next release of google-auth (ETA: today). The error itself is benign.

[1]
https://github.com/googleapis/google-auth-library-python/commit/6dd2597bd63be6719a0b088de21ef7e48d9d1884


On Tue, Jul 28, 2020 at 11:53 AM Kyle Weaver  wrote:

> +1
> Ran Python wordcount {2.7, 3.7} x {Spark, Flink 1.10} job server
> containers.
>
> On Thu, Jul 23, 2020 at 10:42 AM Robert Bradshaw 
> wrote:
>
>> +1 (binding)
>>
>> I validated the hashes and signatures of all the release artifacts, and
>> that the source tarball matches github
>> at 5df6e7949629799c46d227171281364144149f5d. I also verified that the diff
>> from the last RC contains what is expected and ran some basic pipelines
>> against a Python 3 wheel in a fresh virtual environment. All looks good.
>>
>> On Thu, Jul 23, 2020 at 9:47 AM Chamikara Jayalath 
>> wrote:
>>
>>> +1 (non-binding)
>>> Tested Java quickstart and an x-lang pipeline.
>>>
>>> Thanks,
>>> Cham
>>>
>>> On Wed, Jul 22, 2020 at 7:34 PM Ahmet Altay  wrote:
>>>
 +1 - I validated py3 quickstarts.

 On Wed, Jul 22, 2020 at 6:21 PM Valentyn Tymofieiev <
 valen...@google.com> wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #2 for the version
> 2.23.0, as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org 
> [2],
> which is signed with the key with fingerprint 1DF50603225D29A4 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.23.0-RС2" [5],
> * website pull request listing the release [6], publishing the API
> reference manual [7], and the blog post [8].
> * Java artifacts were built with Maven 3.6.0 and Oracle JDK
> 1.8.0_201-b09 .
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
> * Validation sheet with a tab for 2.23.0 release to help with
> validation [9].
> * Docker images published to Docker Hub [10].
>
> The vote will be open for at least 72 hours. It is adopted by
> majority approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Release manager.
>
> [1]
> https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12347145
> [2] https://dist.apache.org/repos/dist/dev/beam/2.23.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1106/
> [5] https://github.com/apache/beam/tree/v2.23.0-RC2
> [6] https://github.com/apache/beam/pull/12212
> [7] https://github.com/apache/beam-site/pull/605
> [8] https://github.com/apache/beam/pull/12213
> [9]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=596347973
> [10] https://hub.docker.com/search?q=apache%2Fbeam=image
>



Re: [VOTE] Release 2.23.0, release candidate #2

2020-07-28 Thread Kyle Weaver
+1
Ran Python wordcount {2.7, 3.7} x {Spark, Flink 1.10} job server containers.

On Thu, Jul 23, 2020 at 10:42 AM Robert Bradshaw 
wrote:

> +1 (binding)
>
> I validated the hashes and signatures of all the release artifacts, and
> that the source tarball matches github
> at 5df6e7949629799c46d227171281364144149f5d. I also verified that the diff
> from the last RC contains what is expected and ran some basic pipelines
> against a Python 3 wheel in a fresh virtual environment. All looks good.
>
> On Thu, Jul 23, 2020 at 9:47 AM Chamikara Jayalath 
> wrote:
>
>> +1 (non-binding)
>> Tested Java quickstart and an x-lang pipeline.
>>
>> Thanks,
>> Cham
>>
>> On Wed, Jul 22, 2020 at 7:34 PM Ahmet Altay  wrote:
>>
>>> +1 - I validated py3 quickstarts.
>>>
>>> On Wed, Jul 22, 2020 at 6:21 PM Valentyn Tymofieiev 
>>> wrote:
>>>
 Hi everyone,

 Please review and vote on the release candidate #2 for the version
 2.23.0, as follows:
 [ ] +1, Approve the release
 [ ] -1, Do not approve the release (please provide specific comments)


 The complete staging area is available for your review, which includes:
 * JIRA release notes [1],
 * the official Apache source release to be deployed to dist.apache.org [2],
 which is signed with the key with fingerprint 1DF50603225D29A4 [3],
 * all artifacts to be deployed to the Maven Central Repository [4],
 * source code tag "v2.23.0-RС2" [5],
 * website pull request listing the release [6], publishing the API
 reference manual [7], and the blog post [8].
 * Java artifacts were built with Maven 3.6.0 and Oracle JDK
 1.8.0_201-b09 .
 * Python artifacts are deployed along with the source release to the
 dist.apache.org [2].
 * Validation sheet with a tab for 2.23.0 release to help with
 validation [9].
 * Docker images published to Docker Hub [10].

 The vote will be open for at least 72 hours. It is adopted by majority
 approval, with at least 3 PMC affirmative votes.

 Thanks,
 Release manager.

 [1]
 https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12347145
 [2] https://dist.apache.org/repos/dist/dev/beam/2.23.0/
 [3] https://dist.apache.org/repos/dist/release/beam/KEYS
 [4]
 https://repository.apache.org/content/repositories/orgapachebeam-1106/
 [5] https://github.com/apache/beam/tree/v2.23.0-RC2
 [6] https://github.com/apache/beam/pull/12212
 [7] https://github.com/apache/beam-site/pull/605
 [8] https://github.com/apache/beam/pull/12213
 [9]
 https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=596347973
 [10] https://hub.docker.com/search?q=apache%2Fbeam=image

>>>


Re: No space left on device - beam-jenkins 1 and 7

2020-07-28 Thread Damian Gadomski
Sorry, mistake while copying, [1] should be:
[1]
https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L63


On Tue, Jul 28, 2020 at 7:21 PM Damian Gadomski 
wrote:

> That's interesting. I didn't check that myself but all the Jenkins jobs
> are configured to wipe the workspace just before the actual build happens
> [1]
> .
> Git SCM plugin is used for that and it enables the option called "Wipe out
> repository and force clone". Docs state that it "deletes the contents of
> the workspace before build and before checkout" [2]
> . Therefore I assume that removing
> workspace just after the build won't change anything.
>
> The ./.gradle/caches/modules-2/files-2.1 dir is indeed present on the
> worker machines but it's rather in /home/jenkins dir.
>
> damgad@apache-ci-beam-jenkins-13:/home/jenkins/.gradle$ sudo du -sh
> 11G .
> damgad@apache-ci-beam-jenkins-13:/home/jenkins/.gradle$ sudo du -sh
> caches/modules-2/files-2.1
> 2.3G caches/modules-2/files-2.1
>
> I can't find that directory structure inside workspaces.
>
> damgad@apache-ci-beam-jenkins-13:/home/jenkins/jenkins-slave/workspace$
> sudo find -name "files-2.1"
> damgad@apache-ci-beam-jenkins-13:/home/jenkins/jenkins-slave/workspace$
>
> [1]
> https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L6
> [2] https://plugins.jenkins.io/git/
>
> On Tue, Jul 28, 2020 at 5:47 PM Kenneth Knowles  wrote:
>
>> Just checking - will this wipe out dependency cache? That will slow
>> things down and significantly increase flakiness. If I recall correctly,
>> the default Jenkins layout was:
>>
>> /home/jenkins/jenkins-slave/workspace/$jobname
>> /home/jenkins/jenkins-slave/workspace/$jobname/.m2
>> /home/jenkins/jenkins-slave/workspace/$jobname/.git
>>
>> Where you can see that it did a `git clone` right into the root workspace
>> directory, adjacent to .m2. This was not hygienic. One important thing was
>> that `git clean` would wipe the maven cache with every build. So in
>> https://github.com/apache/beam/pull/3976 we changed it to
>>
>> /home/jenkins/jenkins-slave/workspace/$jobname
>> /home/jenkins/jenkins-slave/workspace/$jobname/.m2
>> /home/jenkins/jenkins-slave/workspace/$jobname/src/.git
>>
>> Now the .m2 directory survives and we do not constantly see flakes
>> re-downloading deps that are immutable. This does, of course, use disk
>> space.
>>
>> That was in the maven days. Gradle is the same except for $HOME/.m2 is
>> replaced by $HOME/.gradle/caches/modules-2/files-2.1. Is Jenkins configured
>> the same way so we will be wiping out the dependencies? If so, can you
>> address this issue? Everything in that directory should be immutable and
>> just a cache to avoid pointless re-download.
>>
>> Kenn
>>
>> On Tue, Jul 28, 2020 at 2:25 AM Damian Gadomski <
>> damian.gadom...@polidea.com> wrote:
>>
>>> Agree with Udi, workspaces seem to be the third culprit, not yet
>>> addressed in any way (until PR#12326
>>>  is merged). I feel that
>>> it'll solve the issue of filling up the disks for a long time ;)
>>>
>>> I'm also OK with moving /tmp cleanup to option B, and will happily
>>> investigate on proper TMPDIR config.
>>>
>>>
>>>
>>> On Tue, Jul 28, 2020 at 3:07 AM Udi Meiri  wrote:
>>>
 What about the workspaces, which can take up 175GB in some cases (see
 above)?
 I'm working on getting them cleaned up automatically:
 https://github.com/apache/beam/pull/12326

 My opinion is that we would get more mileage out of fixing the jobs
 that leave behind files in /tmp and images/containers in Docker.
 This would also help keep development machines clean.


 On Mon, Jul 27, 2020 at 5:31 PM Tyson Hamilton 
 wrote:

> Here is a summery of how I understand things,
>
>   - /tmp and /var/lib/docker are the culprit for filling up disks
>   - inventory Jenkins job runs every 12 hours and runs a docker prune
> to clean up images older than 24hr
>   - crontab on each machine cleans up /tmp files older than three days
> weekly
>
> This doesn't seem to be working since we're still running out of disk
> periodically and requiring manual intervention. Knobs and options we have
> available:
>
>   1. increase frequency of deleting files
>   2. decrease the number of days required to delete a file (e.g. older
> than 2 days)
>
> The execution methods we have available are:
>
>   A. cron
> - pro: runs even if a job gets stuck in Jenkins due to full disk
> - con: config baked into VM which is tough to update, not
> discoverable or documented well
>   B. inventory job
>

Re: No space left on device - beam-jenkins 1 and 7

2020-07-28 Thread Damian Gadomski
That's interesting. I didn't check that myself but all the Jenkins jobs are
configured to wipe the workspace just before the actual build happens [1]
.
Git SCM plugin is used for that and it enables the option called "Wipe out
repository and force clone". Docs state that it "deletes the contents of
the workspace before build and before checkout" [2]
. Therefore I assume that removing
workspace just after the build won't change anything.

The ./.gradle/caches/modules-2/files-2.1 dir is indeed present on the
worker machines but it's rather in /home/jenkins dir.

damgad@apache-ci-beam-jenkins-13:/home/jenkins/.gradle$ sudo du -sh
11G .
damgad@apache-ci-beam-jenkins-13:/home/jenkins/.gradle$ sudo du -sh
caches/modules-2/files-2.1
2.3G caches/modules-2/files-2.1

I can't find that directory structure inside workspaces.

damgad@apache-ci-beam-jenkins-13:/home/jenkins/jenkins-slave/workspace$
sudo find -name "files-2.1"
damgad@apache-ci-beam-jenkins-13:/home/jenkins/jenkins-slave/workspace$

[1]
https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L6
[2] https://plugins.jenkins.io/git/

On Tue, Jul 28, 2020 at 5:47 PM Kenneth Knowles  wrote:

> Just checking - will this wipe out dependency cache? That will slow things
> down and significantly increase flakiness. If I recall correctly, the
> default Jenkins layout was:
>
> /home/jenkins/jenkins-slave/workspace/$jobname
> /home/jenkins/jenkins-slave/workspace/$jobname/.m2
> /home/jenkins/jenkins-slave/workspace/$jobname/.git
>
> Where you can see that it did a `git clone` right into the root workspace
> directory, adjacent to .m2. This was not hygienic. One important thing was
> that `git clean` would wipe the maven cache with every build. So in
> https://github.com/apache/beam/pull/3976 we changed it to
>
> /home/jenkins/jenkins-slave/workspace/$jobname
> /home/jenkins/jenkins-slave/workspace/$jobname/.m2
> /home/jenkins/jenkins-slave/workspace/$jobname/src/.git
>
> Now the .m2 directory survives and we do not constantly see flakes
> re-downloading deps that are immutable. This does, of course, use disk
> space.
>
> That was in the maven days. Gradle is the same except for $HOME/.m2 is
> replaced by $HOME/.gradle/caches/modules-2/files-2.1. Is Jenkins configured
> the same way so we will be wiping out the dependencies? If so, can you
> address this issue? Everything in that directory should be immutable and
> just a cache to avoid pointless re-download.
>
> Kenn
>
> On Tue, Jul 28, 2020 at 2:25 AM Damian Gadomski <
> damian.gadom...@polidea.com> wrote:
>
>> Agree with Udi, workspaces seem to be the third culprit, not yet
>> addressed in any way (until PR#12326
>>  is merged). I feel that
>> it'll solve the issue of filling up the disks for a long time ;)
>>
>> I'm also OK with moving /tmp cleanup to option B, and will happily
>> investigate on proper TMPDIR config.
>>
>>
>>
>> On Tue, Jul 28, 2020 at 3:07 AM Udi Meiri  wrote:
>>
>>> What about the workspaces, which can take up 175GB in some cases (see
>>> above)?
>>> I'm working on getting them cleaned up automatically:
>>> https://github.com/apache/beam/pull/12326
>>>
>>> My opinion is that we would get more mileage out of fixing the jobs that
>>> leave behind files in /tmp and images/containers in Docker.
>>> This would also help keep development machines clean.
>>>
>>>
>>> On Mon, Jul 27, 2020 at 5:31 PM Tyson Hamilton 
>>> wrote:
>>>
 Here is a summery of how I understand things,

   - /tmp and /var/lib/docker are the culprit for filling up disks
   - inventory Jenkins job runs every 12 hours and runs a docker prune
 to clean up images older than 24hr
   - crontab on each machine cleans up /tmp files older than three days
 weekly

 This doesn't seem to be working since we're still running out of disk
 periodically and requiring manual intervention. Knobs and options we have
 available:

   1. increase frequency of deleting files
   2. decrease the number of days required to delete a file (e.g. older
 than 2 days)

 The execution methods we have available are:

   A. cron
 - pro: runs even if a job gets stuck in Jenkins due to full disk
 - con: config baked into VM which is tough to update, not
 discoverable or documented well
   B. inventory job
 - pro: easy to update, runs every 12h already
 - con: could get stuck if Jenkins agent runs out of disk or is
 otherwise stuck, tied to all other inventory job frequency
   C. configure startup scripts for the VMs that set up the cron job
 anytime the VM is restarted
 - pro: similar to A. and easy to update
 - con: similar to A.

Re: No space left on device - beam-jenkins 1 and 7

2020-07-28 Thread Kenneth Knowles
Just checking - will this wipe out dependency cache? That will slow things
down and significantly increase flakiness. If I recall correctly, the
default Jenkins layout was:

/home/jenkins/jenkins-slave/workspace/$jobname
/home/jenkins/jenkins-slave/workspace/$jobname/.m2
/home/jenkins/jenkins-slave/workspace/$jobname/.git

Where you can see that it did a `git clone` right into the root workspace
directory, adjacent to .m2. This was not hygienic. One important thing was
that `git clean` would wipe the maven cache with every build. So in
https://github.com/apache/beam/pull/3976 we changed it to

/home/jenkins/jenkins-slave/workspace/$jobname
/home/jenkins/jenkins-slave/workspace/$jobname/.m2
/home/jenkins/jenkins-slave/workspace/$jobname/src/.git

Now the .m2 directory survives and we do not constantly see flakes
re-downloading deps that are immutable. This does, of course, use disk
space.

That was in the maven days. Gradle is the same except for $HOME/.m2 is
replaced by $HOME/.gradle/caches/modules-2/files-2.1. Is Jenkins configured
the same way so we will be wiping out the dependencies? If so, can you
address this issue? Everything in that directory should be immutable and
just a cache to avoid pointless re-download.

Kenn

On Tue, Jul 28, 2020 at 2:25 AM Damian Gadomski 
wrote:

> Agree with Udi, workspaces seem to be the third culprit, not yet addressed
> in any way (until PR#12326  is
> merged). I feel that it'll solve the issue of filling up the disks for a
> long time ;)
>
> I'm also OK with moving /tmp cleanup to option B, and will happily
> investigate on proper TMPDIR config.
>
>
>
> On Tue, Jul 28, 2020 at 3:07 AM Udi Meiri  wrote:
>
>> What about the workspaces, which can take up 175GB in some cases (see
>> above)?
>> I'm working on getting them cleaned up automatically:
>> https://github.com/apache/beam/pull/12326
>>
>> My opinion is that we would get more mileage out of fixing the jobs that
>> leave behind files in /tmp and images/containers in Docker.
>> This would also help keep development machines clean.
>>
>>
>> On Mon, Jul 27, 2020 at 5:31 PM Tyson Hamilton 
>> wrote:
>>
>>> Here is a summery of how I understand things,
>>>
>>>   - /tmp and /var/lib/docker are the culprit for filling up disks
>>>   - inventory Jenkins job runs every 12 hours and runs a docker prune to
>>> clean up images older than 24hr
>>>   - crontab on each machine cleans up /tmp files older than three days
>>> weekly
>>>
>>> This doesn't seem to be working since we're still running out of disk
>>> periodically and requiring manual intervention. Knobs and options we have
>>> available:
>>>
>>>   1. increase frequency of deleting files
>>>   2. decrease the number of days required to delete a file (e.g. older
>>> than 2 days)
>>>
>>> The execution methods we have available are:
>>>
>>>   A. cron
>>> - pro: runs even if a job gets stuck in Jenkins due to full disk
>>> - con: config baked into VM which is tough to update, not
>>> discoverable or documented well
>>>   B. inventory job
>>> - pro: easy to update, runs every 12h already
>>> - con: could get stuck if Jenkins agent runs out of disk or is
>>> otherwise stuck, tied to all other inventory job frequency
>>>   C. configure startup scripts for the VMs that set up the cron job
>>> anytime the VM is restarted
>>> - pro: similar to A. and easy to update
>>> - con: similar to A.
>>>
>>> Between the three I prefer B. because it is consistent with other
>>> inventory jobs. If it ends up that stuck jobs prohibit scheduling of the
>>> inventory job often we could further investigate C to avoid having to
>>> rebuild the VM images repeatedly.
>>>
>>> Any objections or comments? If not, we'll go forward with B. and reduce
>>> the date check from 3 days to 2 days.
>>>
>>>
>>> On 2020/07/24 20:13:29, Ahmet Altay  wrote:
>>> > Tests may not be doing docker cleanup. Inventory job runs a docker
>>> prune
>>> > every 12 hours for images older than 24 hrs [1]. Randomly looking at
>>> one of
>>> > the recent runs [2], it cleaned up a long list of containers consuming
>>> > 30+GB space. That should be just 12 hours worth of containers.
>>> >
>>> > [1]
>>> >
>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L69
>>> > [2]
>>> >
>>> https://ci-beam.apache.org/job/beam_Inventory_apache-beam-jenkins-14/501/console
>>> >
>>> > On Fri, Jul 24, 2020 at 1:07 PM Tyson Hamilton 
>>> wrote:
>>> >
>>> > > Yes, these are on the same volume in the /var/lib/docker directory.
>>> I'm
>>> > > unsure if they clean up leftover images.
>>> > >
>>> > > On Fri, Jul 24, 2020 at 12:52 PM Udi Meiri  wrote:
>>> > >
>>> > >> I forgot Docker images:
>>> > >>
>>> > >> ehudm@apache-ci-beam-jenkins-3:~$ sudo docker system df
>>> > >> TYPETOTAL   ACTIVE  SIZE
>>> > >>RECLAIMABLE
>>> > >> Images  88  9

Re: Use concrete instances of ExternalTransformBuilder in ExternalTransformRegistrar?

2020-07-28 Thread Maximilian Michels

Replacing

  Class

with

  ExternalTransformBuilder

sounds reasonable to me. Looks like an oversight that we introduced the 
unnecessary class indirection.


-Max

On 27.07.20 20:45, Chamikara Jayalath wrote:
Brian's suggestion makes sense to me. I don't know of a specific reason 
regarding why we choose the Class type in the registrar instead of 
instance types. +Maximilian Michels  +Robert 
Bradshaw  may have more context.


Thanks,
Cham

On Mon, Jul 27, 2020 at 10:48 AM Kenneth Knowles > wrote:




On Mon, Jul 27, 2020 at 10:47 AM Kenneth Knowles mailto:k...@apache.org>> wrote:

On Sun, Jul 26, 2020 at 8:50 PM Kenneth Knowles mailto:k...@apache.org>> wrote:

Rawtypes are a legacy compatibility feature that breaks type
checking (and further analyses)


Noting for the benefit of the thread that this is not
hypothetical. Fixing the rawtypes in this API surfaced
nullability issues according to spotbugs.


Additionally notable that Spotbugs operates on post-compile
bytecode, not source.

Kenn


Kenn



and harms readability. They should be forbidden in new code.
Class literals for generic types are quite inconvenient for
this, especially when placed in a heterogeneous map using
wildcard parameters [1].

So making either the change Brian proposes or something
similar is desirable, to avoid forcing inconvenience on
users of the API, and to just simplify and clarify it.

Kenn

[1]

https://github.com/apache/beam/pull/12376/files#diff-2fa38a7f8d24217f1f7bde0f5c7dbb40R495

Kenn

On Fri, Jul 24, 2020 at 11:04 AM Brian Hulette
mailto:bhule...@google.com>> wrote:

Hi all,
I've been working with +Scott Lukas
 on using the new schema io
interfaces [1] in cross-language. This means adding a
general-purpose ExternalTransformRegistrar [2,3] that
will register all SchemaIOProvider implementations via
ServiceLoader.

We've run into an issue though -
ExternalTransformRegistrar is supposed to return a
`Map>`. This makes it very
challenging (impossible?) for us to create a
general-purpose ExternalTransformBuilder that defers to
SchemaIOProvider. Ideally we would instead return a
Map (i.e. a concrete
instance rather than a class object), so that we could
just register different instances of a class like:

class SchemaIOBuilder extends ExternalTransformBuilder {
   private SchemaIOProvider provider;
   PTransform buildExternal(ConfigT
configuration) {
     // Use provider to create PTransform
   }
}

I think it would be possible to change the
ExternalTransformRegistrar interface so it has a single
method, Map
knownBuilders(). It could even be done in a
backwards-compatible way if we keep the old method and
provide a default implementation of the new method that
builds instances.

However, I'm curious if there's some strong reason for
using Class as the
return type for knownBuilders that I'm missing. Does
anyone know why we chose that?

Thanks,
Brian

[1] https://s.apache.org/beam-schema-io
[2]

https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ExternalTransformRegistrar.java
[3]

https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ExternalTransformBuilder.java



Re: Beam Jenkins Migration

2020-07-28 Thread Damian Gadomski
Ismael, there's still room for that (as well as for running multiple times
and taking the median as Valentyn proposed) as the jobs anyway fully occupy
one machine. The load statistics [1] show that currently that worker is
most of the time idle. As of now, last time the jobs were executed they all
took about 40 minutes [2]. That makes nearly 90% of idle time for the `
apache-beam-jenkins-16`, because they are triggered every 6 hours. That's
the state of the cron-triggered jobs listed here [2].
There are also `_PR` versions of these jobs, that share the DSL config, and
could be run from GitHub phrases: [3], [4], [5]. They are not tied to the
16th worker and spread on the rest of them, but that shouldn't be an issue,
either. As you can see in the history they are not triggered that often.

[1]
https://ci-beam.apache.org/computer/apache-beam-jenkins-16/load-statistics?type=min
[2] https://ci-beam.apache.org/label/beam-perf/
[3] https://ci-beam.apache.org/job/beam_PostCommit_Java_Nexmark_Flink_PR
[4] https://ci-beam.apache.org/job/beam_PostCommit_Java_Nexmark_Spark_PR
[5] https://ci-beam.apache.org/job/beam_PostCommit_Java_Nexmark_Direct_PR

On Mon, Jul 27, 2020 at 6:54 PM Valentyn Tymofieiev 
wrote:

> +1, thanks, Damian!
>
> > Are Spark and Flink runners benchmarking against local clusters on the
> Jenkins VMs?
>
> I believe that's the case and yes, the load on
> local-running benchmarks seems to be rather low, especially on some queries.
> Another avenue to improve the signal stability would be to run the
> benchmarks multiple times and analyze the 50th percentile of the readings.
>
> On Mon, Jul 27, 2020 at 9:47 AM Ismaël Mejía  wrote:
>
>> Great analysis Damian thanks for taking a look and fixing this. Great
>> to know it was not anything related to Beam's code.
>>
>> I wonder if we should probably change the input size for the open
>> source runners (currently is 1/10 of Dataflow, that explains the big
>> difference on time), with the goal of detecting regressions better,
>> the current size is so small that adding 1s of extra time in some runs
>> looks like a 50-60% degradation and we cannot know if this is due to
>> some small small CPU/GC pause or a real regression. I wonder however
>> if this will impact negatively the worker utilization.
>>
>>
>> On Mon, Jul 27, 2020 at 4:07 PM Damian Gadomski
>>  wrote:
>> >
>> > Hey all,
>> >
>> > I've done a few checks to pinpoint the issue and it seems that I've
>> just fixed it.
>> >
>> > Didn't know that before but the Flink, Spark and Direct Nexmark tests
>> are running on special Jenkins worker. The `apache-beam-jenkins-16` is
>> labeled with `beam-perf`, so only these tests can execute there. I'm not
>> sure, because the configuration on the old CI is already gone, but I guess
>> that this worker was configured to have only one executor (which I had
>> missed). That would forbid concurrent execution of the jobs and
>> improve/stabilize the timings.
>> >
>> > That's how I currently configured the node and seems that the timings
>> are back to the pre-migration values:
>> http://104.154.241.245/d/ahuaA_zGz/nexmark?orgId=1=no:w-90d=now
>> >
>> > Dataflow was not affected because it wasn't restricted to run on
>> `apache-beam-jenkins-16`.
>> >
>> > Regards,
>> > Damian
>> >
>> >
>> > On Wed, Jul 22, 2020 at 5:11 PM Kenneth Knowles 
>> wrote:
>> >>
>> >> Are Spark and Flink runners benchmarking against local clusters on the
>> Jenkins VMs? Needless to say that is not a very controlled environment (and
>> of course not realistic scale). That is probably why Dataflow was not
>> affected. Is it possible that simply the different version of the Jenkins
>> worker software and/or the instructions from the Cloudbees instance cause
>> differing load?
>> >>
>> >> Kenn
>> >>
>> >> On Tue, Jul 21, 2020 at 4:17 PM Valentyn Tymofieiev <
>> valen...@google.com> wrote:
>> >>>
>> >>> FYI it looks like the transition to new Jenkins CI is visible on
>> Nexmark performance graphs[1][2]. Are new VM nodes less performant than old
>> ones?
>> >>>
>> >>> [1] hhttp://
>> 104.154.241.245/d/ahuaA_zGz/nexmark?orgId=1=1587597387737=1595373387737=batch=All=All
>> >>> [2]
>> https://issues.apache.org/jira/browse/BEAM-10542?focusedCommentId=17162374=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17162374
>> >>>
>> >>> On Thu, Jun 18, 2020 at 3:32 PM Tyson Hamilton 
>> wrote:
>> 
>>  Currently no. We're already experiencing a backlog of builds so the
>> additional load would be a problem. I've opened two related issues that I
>> think need completion before allowing non-committers to trigger tests:
>> 
>>  Load sharing improvements:
>> https://issues.apache.org/jira/browse/BEAM-10281
>>  Admin access (maybe not required but nice to have):
>> https://issues.apache.org/jira/browse/BEAM-10280
>> 
>>  I created https://issues.apache.org/jira/browse/BEAM-10282 to track
>> opening up triggering for non-committers.
>> 
>>  On Thu, Jun 18, 

Re: No space left on device - beam-jenkins 1 and 7

2020-07-28 Thread Damian Gadomski
Agree with Udi, workspaces seem to be the third culprit, not yet addressed
in any way (until PR#12326  is
merged). I feel that it'll solve the issue of filling up the disks for a
long time ;)

I'm also OK with moving /tmp cleanup to option B, and will happily
investigate on proper TMPDIR config.



On Tue, Jul 28, 2020 at 3:07 AM Udi Meiri  wrote:

> What about the workspaces, which can take up 175GB in some cases (see
> above)?
> I'm working on getting them cleaned up automatically:
> https://github.com/apache/beam/pull/12326
>
> My opinion is that we would get more mileage out of fixing the jobs that
> leave behind files in /tmp and images/containers in Docker.
> This would also help keep development machines clean.
>
>
> On Mon, Jul 27, 2020 at 5:31 PM Tyson Hamilton  wrote:
>
>> Here is a summery of how I understand things,
>>
>>   - /tmp and /var/lib/docker are the culprit for filling up disks
>>   - inventory Jenkins job runs every 12 hours and runs a docker prune to
>> clean up images older than 24hr
>>   - crontab on each machine cleans up /tmp files older than three days
>> weekly
>>
>> This doesn't seem to be working since we're still running out of disk
>> periodically and requiring manual intervention. Knobs and options we have
>> available:
>>
>>   1. increase frequency of deleting files
>>   2. decrease the number of days required to delete a file (e.g. older
>> than 2 days)
>>
>> The execution methods we have available are:
>>
>>   A. cron
>> - pro: runs even if a job gets stuck in Jenkins due to full disk
>> - con: config baked into VM which is tough to update, not
>> discoverable or documented well
>>   B. inventory job
>> - pro: easy to update, runs every 12h already
>> - con: could get stuck if Jenkins agent runs out of disk or is
>> otherwise stuck, tied to all other inventory job frequency
>>   C. configure startup scripts for the VMs that set up the cron job
>> anytime the VM is restarted
>> - pro: similar to A. and easy to update
>> - con: similar to A.
>>
>> Between the three I prefer B. because it is consistent with other
>> inventory jobs. If it ends up that stuck jobs prohibit scheduling of the
>> inventory job often we could further investigate C to avoid having to
>> rebuild the VM images repeatedly.
>>
>> Any objections or comments? If not, we'll go forward with B. and reduce
>> the date check from 3 days to 2 days.
>>
>>
>> On 2020/07/24 20:13:29, Ahmet Altay  wrote:
>> > Tests may not be doing docker cleanup. Inventory job runs a docker prune
>> > every 12 hours for images older than 24 hrs [1]. Randomly looking at
>> one of
>> > the recent runs [2], it cleaned up a long list of containers consuming
>> > 30+GB space. That should be just 12 hours worth of containers.
>> >
>> > [1]
>> >
>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_Inventory.groovy#L69
>> > [2]
>> >
>> https://ci-beam.apache.org/job/beam_Inventory_apache-beam-jenkins-14/501/console
>> >
>> > On Fri, Jul 24, 2020 at 1:07 PM Tyson Hamilton 
>> wrote:
>> >
>> > > Yes, these are on the same volume in the /var/lib/docker directory.
>> I'm
>> > > unsure if they clean up leftover images.
>> > >
>> > > On Fri, Jul 24, 2020 at 12:52 PM Udi Meiri  wrote:
>> > >
>> > >> I forgot Docker images:
>> > >>
>> > >> ehudm@apache-ci-beam-jenkins-3:~$ sudo docker system df
>> > >> TYPETOTAL   ACTIVE  SIZE
>> > >>RECLAIMABLE
>> > >> Images  88  9   125.4GB
>> > >>   124.2GB (99%)
>> > >> Containers  40  4   7.927GB
>> > >>   7.871GB (99%)
>> > >> Local Volumes   47  0   3.165GB
>> > >>   3.165GB (100%)
>> > >> Build Cache 0   0   0B
>> > >>0B
>> > >>
>> > >> There are about 90 images on that machine, with all but 1 less than
>> 48
>> > >> hours old.
>> > >> I think the docker test jobs need to try harder at cleaning up their
>> > >> leftover images. (assuming they're already doing it?)
>> > >>
>> > >> On Fri, Jul 24, 2020 at 12:31 PM Udi Meiri  wrote:
>> > >>
>> > >>> The additional slots (@3 directories) take up even more space now
>> than
>> > >>> before.
>> > >>>
>> > >>> I'm testing out https://github.com/apache/beam/pull/12326 which
>> could
>> > >>> help by cleaning up workspaces after a run (just started a seed
>> job).
>> > >>>
>> > >>> On Fri, Jul 24, 2020 at 12:13 PM Tyson Hamilton > >
>> > >>> wrote:
>> > >>>
>> >  664Mbeam_PreCommit_JavaPortabilityApi_Commit
>> >  656Mbeam_PreCommit_JavaPortabilityApi_Commit@2
>> >  611Mbeam_PreCommit_JavaPortabilityApi_Cron
>> >  616Mbeam_PreCommit_JavaPortabilityApiJava11_Commit
>> >  598Mbeam_PreCommit_JavaPortabilityApiJava11_Commit@2
>> >  662Mbeam_PreCommit_JavaPortabilityApiJava11_Cron
>> >  2.9G