Re: Broken links in code velocity dashboard

2020-08-07 Thread Damian Gadomski
Unfortunately, I'm not aware of any recent changes.

On Thu, Aug 6, 2020 at 10:00 PM Ahmet Altay  wrote:

> Damian, or anyone else, do you know if there were other changes to the
> dashboard?
>
> I started to see closed PRs in the currently open PRs list (e.g.
> https://github.com/apache/beam/pull/12349,
> https://github.com/apache/beam/pull/12374). Not sure what is causing it,
> but it seems like a new issue.
>
> On Fri, Jul 31, 2020 at 10:13 AM Ahmet Altay  wrote:
>
>> Looks fixed. Thank you for the quick response!
>>
>> On Fri, Jul 31, 2020 at 7:05 AM Damian Gadomski <
>> damian.gadom...@polidea.com> wrote:
>>
>>> Oops. Sorry about that. Everything should be fixed now. URLs are correct
>>> and I've also removed wrong entries from the DB.
>>>
>>> If you're curious, you were right, it was mistakenly deployed from my
>>> fork. Actually, my private Jenkins instance did it. Will be more cautious
>>> with the jobs.
>>>
>>> Regards,
>>> Damian
>>>
>>> On Fri, Jul 31, 2020 at 1:32 AM Ahmet Altay  wrote:
>>>
>>>> Currently the open PRs section of the dashboard [1] seems to be broken.
>>>> PRs are linking to https://github.com/damgadbot/beam/issues/>>> NUMBER> instead of https://github.com/apache/beam/pull/.
>>>> And the PR list showing PRs from the (https://github.com/damgadbot/beam)
>>>> fork in addition to the main repo.
>>>>
>>>> Dashboard was working normally last week, so probably something changed
>>>> recently. I do not see a code change in the dashboard [2]. I am not sure
>>>> what happened, maybe the dashboard was deployed from a fork?
>>>>
>>>> +Damian Gadomski  - based on the url
>>>> changes :)
>>>>
>>>> Thank you,
>>>> Ahmet
>>>>
>>>> [1]
>>>> http://metrics.beam.apache.org/d/code_velocity/code-velocity?orgId=1
>>>> [2]
>>>> https://github.com/apache/beam/blob/master/.test-infra/metrics/grafana/dashboards/code_velocity.json
>>>>
>>>


Re: Java Jenkins tests not running

2020-08-07 Thread Damian Gadomski
Hey,

We knew about the issue. All Jenkins builds were not being triggered from
the pull requests. There's a ticket for that if you're curious about
details: INF RA-20649

Seems that we've just fixed it. I've also retriggered the tests from your
PR.

Regards,
Damian


On Fri, Aug 7, 2020 at 9:11 AM Reuven Lax  wrote:

> Does anyone know why Jenkins is not triggering any Java tests for
> pr/12474? It is only triggering python tasks, which is odd considering that
> this PR doesn't touch any python files.
>
> Reuven
>


Failing Python builds & AppEngine application

2020-08-06 Thread Damian Gadomski
Hey,

A strange thing happened a few hours ago. All python builds (e.g. [1])
started failing because of:

google.api_core.exceptions.NotFound: 404 The project apache-beam-testing
does not exist or it does not contain an active Cloud Datastore or Cloud
Firestore database. Please visit http://console.cloud.google.com to create
a project or
https://console.cloud.google.com/datastore/setup?project=apache-beam-testing
to add a Cloud Datastore or Cloud Firestore database. Note that Cloud
Datastore or Cloud Firestore always have an associated App Engine app and
this app must not be disabled.

I've checked that manually and the same error appeared while accessing [2].
Seems that we are using Cloud Datastore and indeed there was a default
AppEngine application [3] that was disabled and therefore Datastore was
inactive. I've just enabled back this app and the Datastore became active
again. Hopefully, that will fix the builds. Basing on the app statistics it
seems that someone disabled it around Aug 5, 21:00 UTC.

I saw the discussion on the devlist recently about the performance
monitoring. The app [3] is also serving the metrics on [4].
CC +Maximilian Michels , +Kamil Wasilewski
 - as you were involved in the discussion
there regarding [4]. Perhaps you know something more about this app or at
least who may know? :)

[1] https://ci-beam.apache.org/job/beam_PostCommit_Python37/2681/console
[2] https://console.cloud.google.com/datastore?project=apache-beam-testing
[3]
https://console.cloud.google.com/appengine?project=apache-beam-testing=default
[4] https://apache-beam-testing.appspot.com

Regards,
Damian


Re: No space left on device - beam-jenkins 1 and 7

2020-08-04 Thread Damian Gadomski
I did a small research on the temporary directories. Seems that there's no
one unified way of telling applications to use a specific path. Neither the
guarantee that all of them will use the dedicated custom directory. Badly
behaving apps could always hardcode `/tmp`, e.g. Java ;)

But, we should be able to handle most of the cases by setting the TEMPDIR
env variable (and also less popular `TMP`, `TEMP`) and passing Java
property `java.io.tmpdir` to the builds.

There's even a plugin [1] that perfectly fits our needs. But it's
unmaintained for 3 years and not available in Jenkins plugin repository.
Not sure if we want to use it anyway. Alternatively, we can add the envs,
property, and the creation of the directory manually to the DSL scripts.

[1] https://github.com/acrolinx/tmpdir-jenkins-plugin

On Wed, Jul 29, 2020 at 12:58 AM Kenneth Knowles  wrote:

> Cool. If it is /home/jenkins it should be just fine. Thanks for checking!
>
> Kenn
>
> On Tue, Jul 28, 2020 at 10:23 AM Damian Gadomski <
> damian.gadom...@polidea.com> wrote:
>
>> Sorry, mistake while copying, [1] should be:
>> [1]
>> https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L63
>>
>>
>> On Tue, Jul 28, 2020 at 7:21 PM Damian Gadomski <
>> damian.gadom...@polidea.com> wrote:
>>
>>> That's interesting. I didn't check that myself but all the Jenkins jobs
>>> are configured to wipe the workspace just before the actual build happens
>>> [1]
>>> <https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L6>.
>>> Git SCM plugin is used for that and it enables the option called "Wipe out
>>> repository and force clone". Docs state that it "deletes the contents of
>>> the workspace before build and before checkout" [2]
>>> <https://plugins.jenkins.io/git/>. Therefore I assume that removing
>>> workspace just after the build won't change anything.
>>>
>>> The ./.gradle/caches/modules-2/files-2.1 dir is indeed present on the
>>> worker machines but it's rather in /home/jenkins dir.
>>>
>>> damgad@apache-ci-beam-jenkins-13:/home/jenkins/.gradle$ sudo du -sh
>>> 11G .
>>> damgad@apache-ci-beam-jenkins-13:/home/jenkins/.gradle$ sudo du -sh
>>> caches/modules-2/files-2.1
>>> 2.3G caches/modules-2/files-2.1
>>>
>>> I can't find that directory structure inside workspaces.
>>>
>>> damgad@apache-ci-beam-jenkins-13:/home/jenkins/jenkins-slave/workspace$
>>> sudo find -name "files-2.1"
>>> damgad@apache-ci-beam-jenkins-13:/home/jenkins/jenkins-slave/workspace$
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L6
>>> [2] https://plugins.jenkins.io/git/
>>>
>>> On Tue, Jul 28, 2020 at 5:47 PM Kenneth Knowles  wrote:
>>>
>>>> Just checking - will this wipe out dependency cache? That will slow
>>>> things down and significantly increase flakiness. If I recall correctly,
>>>> the default Jenkins layout was:
>>>>
>>>> /home/jenkins/jenkins-slave/workspace/$jobname
>>>> /home/jenkins/jenkins-slave/workspace/$jobname/.m2
>>>> /home/jenkins/jenkins-slave/workspace/$jobname/.git
>>>>
>>>> Where you can see that it did a `git clone` right into the root
>>>> workspace directory, adjacent to .m2. This was not hygienic. One important
>>>> thing was that `git clean` would wipe the maven cache with every build. So
>>>> in https://github.com/apache/beam/pull/3976 we changed it to
>>>>
>>>> /home/jenkins/jenkins-slave/workspace/$jobname
>>>> /home/jenkins/jenkins-slave/workspace/$jobname/.m2
>>>> /home/jenkins/jenkins-slave/workspace/$jobname/src/.git
>>>>
>>>> Now the .m2 directory survives and we do not constantly see flakes
>>>> re-downloading deps that are immutable. This does, of course, use disk
>>>> space.
>>>>
>>>> That was in the maven days. Gradle is the same except for $HOME/.m2 is
>>>> replaced by $HOME/.gradle/caches/modules-2/files-2.1. Is Jenkins configured
>>>> the same way so we will be wiping out the dependencies? If so, can you
>>>> address this issue? Everything in that directory should be immutable and
>>>> just a cache to avoid pointless re-download.
>>>>
>>>> Kenn
>>>>
&g

Re: Beam Dependency Check Report (2020-08-03)

2020-08-03 Thread Damian Gadomski
That's probably caused by this [1] PR, workspace had been deleted before
the email was sent.

+Udi Meiri  Moving the workspace clean up to the very end
of post-build actions should help.

[1] https://github.com/apache/beam/pull/12326

On Mon, Aug 3, 2020 at 5:42 PM Brian Hulette  wrote:

> Does anyone know what went wrong here? It looks like the
> associated jenkins job [1] succeeded, and produced
> beam-dependency-check-report.html
>
> [1] https://ci-beam.apache.org/job/beam_Dependency_Check/279/
>
> On Mon, Aug 3, 2020 at 5:28 AM Apache Jenkins Server <
> jenk...@builds.apache.org> wrote:
>
>> ERROR: File
>> 'src/build/dependencyUpdates/beam-dependency-check-report.html' does not
>> exist
>
>


Re: Broken links in code velocity dashboard

2020-07-31 Thread Damian Gadomski
Oops. Sorry about that. Everything should be fixed now. URLs are correct
and I've also removed wrong entries from the DB.

If you're curious, you were right, it was mistakenly deployed from my fork.
Actually, my private Jenkins instance did it. Will be more cautious with
the jobs.

Regards,
Damian

On Fri, Jul 31, 2020 at 1:32 AM Ahmet Altay  wrote:

> Currently the open PRs section of the dashboard [1] seems to be broken.
> PRs are linking to https://github.com/damgadbot/beam/issues/
> instead of https://github.com/apache/beam/pull/. And the PR
> list showing PRs from the (https://github.com/damgadbot/beam) fork in
> addition to the main repo.
>
> Dashboard was working normally last week, so probably something changed
> recently. I do not see a code change in the dashboard [2]. I am not sure
> what happened, maybe the dashboard was deployed from a fork?
>
> +Damian Gadomski  - based on the url changes
> :)
>
> Thank you,
> Ahmet
>
> [1] http://metrics.beam.apache.org/d/code_velocity/code-velocity?orgId=1
> [2]
> https://github.com/apache/beam/blob/master/.test-infra/metrics/grafana/dashboards/code_velocity.json
>


Re: No space left on device - beam-jenkins 1 and 7

2020-07-28 Thread Damian Gadomski
Sorry, mistake while copying, [1] should be:
[1]
https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L63


On Tue, Jul 28, 2020 at 7:21 PM Damian Gadomski 
wrote:

> That's interesting. I didn't check that myself but all the Jenkins jobs
> are configured to wipe the workspace just before the actual build happens
> [1]
> <https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L6>.
> Git SCM plugin is used for that and it enables the option called "Wipe out
> repository and force clone". Docs state that it "deletes the contents of
> the workspace before build and before checkout" [2]
> <https://plugins.jenkins.io/git/>. Therefore I assume that removing
> workspace just after the build won't change anything.
>
> The ./.gradle/caches/modules-2/files-2.1 dir is indeed present on the
> worker machines but it's rather in /home/jenkins dir.
>
> damgad@apache-ci-beam-jenkins-13:/home/jenkins/.gradle$ sudo du -sh
> 11G .
> damgad@apache-ci-beam-jenkins-13:/home/jenkins/.gradle$ sudo du -sh
> caches/modules-2/files-2.1
> 2.3G caches/modules-2/files-2.1
>
> I can't find that directory structure inside workspaces.
>
> damgad@apache-ci-beam-jenkins-13:/home/jenkins/jenkins-slave/workspace$
> sudo find -name "files-2.1"
> damgad@apache-ci-beam-jenkins-13:/home/jenkins/jenkins-slave/workspace$
>
> [1]
> https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L6
> [2] https://plugins.jenkins.io/git/
>
> On Tue, Jul 28, 2020 at 5:47 PM Kenneth Knowles  wrote:
>
>> Just checking - will this wipe out dependency cache? That will slow
>> things down and significantly increase flakiness. If I recall correctly,
>> the default Jenkins layout was:
>>
>> /home/jenkins/jenkins-slave/workspace/$jobname
>> /home/jenkins/jenkins-slave/workspace/$jobname/.m2
>> /home/jenkins/jenkins-slave/workspace/$jobname/.git
>>
>> Where you can see that it did a `git clone` right into the root workspace
>> directory, adjacent to .m2. This was not hygienic. One important thing was
>> that `git clean` would wipe the maven cache with every build. So in
>> https://github.com/apache/beam/pull/3976 we changed it to
>>
>> /home/jenkins/jenkins-slave/workspace/$jobname
>> /home/jenkins/jenkins-slave/workspace/$jobname/.m2
>> /home/jenkins/jenkins-slave/workspace/$jobname/src/.git
>>
>> Now the .m2 directory survives and we do not constantly see flakes
>> re-downloading deps that are immutable. This does, of course, use disk
>> space.
>>
>> That was in the maven days. Gradle is the same except for $HOME/.m2 is
>> replaced by $HOME/.gradle/caches/modules-2/files-2.1. Is Jenkins configured
>> the same way so we will be wiping out the dependencies? If so, can you
>> address this issue? Everything in that directory should be immutable and
>> just a cache to avoid pointless re-download.
>>
>> Kenn
>>
>> On Tue, Jul 28, 2020 at 2:25 AM Damian Gadomski <
>> damian.gadom...@polidea.com> wrote:
>>
>>> Agree with Udi, workspaces seem to be the third culprit, not yet
>>> addressed in any way (until PR#12326
>>> <https://github.com/apache/beam/pull/12326> is merged). I feel that
>>> it'll solve the issue of filling up the disks for a long time ;)
>>>
>>> I'm also OK with moving /tmp cleanup to option B, and will happily
>>> investigate on proper TMPDIR config.
>>>
>>>
>>>
>>> On Tue, Jul 28, 2020 at 3:07 AM Udi Meiri  wrote:
>>>
>>>> What about the workspaces, which can take up 175GB in some cases (see
>>>> above)?
>>>> I'm working on getting them cleaned up automatically:
>>>> https://github.com/apache/beam/pull/12326
>>>>
>>>> My opinion is that we would get more mileage out of fixing the jobs
>>>> that leave behind files in /tmp and images/containers in Docker.
>>>> This would also help keep development machines clean.
>>>>
>>>>
>>>> On Mon, Jul 27, 2020 at 5:31 PM Tyson Hamilton 
>>>> wrote:
>>>>
>>>>> Here is a summery of how I understand things,
>>>>>
>>>>>   - /tmp and /var/lib/docker are the culprit for filling up disks
>>>>>   - inventory Jenkins job runs every 12 hours and runs a docker prune
>>>>> to clean up images older than 24hr
>>>>>   - 

Re: No space left on device - beam-jenkins 1 and 7

2020-07-28 Thread Damian Gadomski
That's interesting. I didn't check that myself but all the Jenkins jobs are
configured to wipe the workspace just before the actual build happens [1]
<https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L6>.
Git SCM plugin is used for that and it enables the option called "Wipe out
repository and force clone". Docs state that it "deletes the contents of
the workspace before build and before checkout" [2]
<https://plugins.jenkins.io/git/>. Therefore I assume that removing
workspace just after the build won't change anything.

The ./.gradle/caches/modules-2/files-2.1 dir is indeed present on the
worker machines but it's rather in /home/jenkins dir.

damgad@apache-ci-beam-jenkins-13:/home/jenkins/.gradle$ sudo du -sh
11G .
damgad@apache-ci-beam-jenkins-13:/home/jenkins/.gradle$ sudo du -sh
caches/modules-2/files-2.1
2.3G caches/modules-2/files-2.1

I can't find that directory structure inside workspaces.

damgad@apache-ci-beam-jenkins-13:/home/jenkins/jenkins-slave/workspace$
sudo find -name "files-2.1"
damgad@apache-ci-beam-jenkins-13:/home/jenkins/jenkins-slave/workspace$

[1]
https://github.com/apache/beam/blob/8aca8ccc7f1a14516ad769b63845ddd4dc163d92/.test-infra/jenkins/CommonJobProperties.groovy#L6
[2] https://plugins.jenkins.io/git/

On Tue, Jul 28, 2020 at 5:47 PM Kenneth Knowles  wrote:

> Just checking - will this wipe out dependency cache? That will slow things
> down and significantly increase flakiness. If I recall correctly, the
> default Jenkins layout was:
>
> /home/jenkins/jenkins-slave/workspace/$jobname
> /home/jenkins/jenkins-slave/workspace/$jobname/.m2
> /home/jenkins/jenkins-slave/workspace/$jobname/.git
>
> Where you can see that it did a `git clone` right into the root workspace
> directory, adjacent to .m2. This was not hygienic. One important thing was
> that `git clean` would wipe the maven cache with every build. So in
> https://github.com/apache/beam/pull/3976 we changed it to
>
> /home/jenkins/jenkins-slave/workspace/$jobname
> /home/jenkins/jenkins-slave/workspace/$jobname/.m2
> /home/jenkins/jenkins-slave/workspace/$jobname/src/.git
>
> Now the .m2 directory survives and we do not constantly see flakes
> re-downloading deps that are immutable. This does, of course, use disk
> space.
>
> That was in the maven days. Gradle is the same except for $HOME/.m2 is
> replaced by $HOME/.gradle/caches/modules-2/files-2.1. Is Jenkins configured
> the same way so we will be wiping out the dependencies? If so, can you
> address this issue? Everything in that directory should be immutable and
> just a cache to avoid pointless re-download.
>
> Kenn
>
> On Tue, Jul 28, 2020 at 2:25 AM Damian Gadomski <
> damian.gadom...@polidea.com> wrote:
>
>> Agree with Udi, workspaces seem to be the third culprit, not yet
>> addressed in any way (until PR#12326
>> <https://github.com/apache/beam/pull/12326> is merged). I feel that
>> it'll solve the issue of filling up the disks for a long time ;)
>>
>> I'm also OK with moving /tmp cleanup to option B, and will happily
>> investigate on proper TMPDIR config.
>>
>>
>>
>> On Tue, Jul 28, 2020 at 3:07 AM Udi Meiri  wrote:
>>
>>> What about the workspaces, which can take up 175GB in some cases (see
>>> above)?
>>> I'm working on getting them cleaned up automatically:
>>> https://github.com/apache/beam/pull/12326
>>>
>>> My opinion is that we would get more mileage out of fixing the jobs that
>>> leave behind files in /tmp and images/containers in Docker.
>>> This would also help keep development machines clean.
>>>
>>>
>>> On Mon, Jul 27, 2020 at 5:31 PM Tyson Hamilton 
>>> wrote:
>>>
>>>> Here is a summery of how I understand things,
>>>>
>>>>   - /tmp and /var/lib/docker are the culprit for filling up disks
>>>>   - inventory Jenkins job runs every 12 hours and runs a docker prune
>>>> to clean up images older than 24hr
>>>>   - crontab on each machine cleans up /tmp files older than three days
>>>> weekly
>>>>
>>>> This doesn't seem to be working since we're still running out of disk
>>>> periodically and requiring manual intervention. Knobs and options we have
>>>> available:
>>>>
>>>>   1. increase frequency of deleting files
>>>>   2. decrease the number of days required to delete a file (e.g. older
>>>> than 2 days)
>>>>
>>>> The execution methods we have available are:
>>>>
>>>>   A. cron
>

Re: Beam Jenkins Migration

2020-07-28 Thread Damian Gadomski
Ismael, there's still room for that (as well as for running multiple times
and taking the median as Valentyn proposed) as the jobs anyway fully occupy
one machine. The load statistics [1] show that currently that worker is
most of the time idle. As of now, last time the jobs were executed they all
took about 40 minutes [2]. That makes nearly 90% of idle time for the `
apache-beam-jenkins-16`, because they are triggered every 6 hours. That's
the state of the cron-triggered jobs listed here [2].
There are also `_PR` versions of these jobs, that share the DSL config, and
could be run from GitHub phrases: [3], [4], [5]. They are not tied to the
16th worker and spread on the rest of them, but that shouldn't be an issue,
either. As you can see in the history they are not triggered that often.

[1]
https://ci-beam.apache.org/computer/apache-beam-jenkins-16/load-statistics?type=min
[2] https://ci-beam.apache.org/label/beam-perf/
[3] https://ci-beam.apache.org/job/beam_PostCommit_Java_Nexmark_Flink_PR
[4] https://ci-beam.apache.org/job/beam_PostCommit_Java_Nexmark_Spark_PR
[5] https://ci-beam.apache.org/job/beam_PostCommit_Java_Nexmark_Direct_PR

On Mon, Jul 27, 2020 at 6:54 PM Valentyn Tymofieiev 
wrote:

> +1, thanks, Damian!
>
> > Are Spark and Flink runners benchmarking against local clusters on the
> Jenkins VMs?
>
> I believe that's the case and yes, the load on
> local-running benchmarks seems to be rather low, especially on some queries.
> Another avenue to improve the signal stability would be to run the
> benchmarks multiple times and analyze the 50th percentile of the readings.
>
> On Mon, Jul 27, 2020 at 9:47 AM Ismaël Mejía  wrote:
>
>> Great analysis Damian thanks for taking a look and fixing this. Great
>> to know it was not anything related to Beam's code.
>>
>> I wonder if we should probably change the input size for the open
>> source runners (currently is 1/10 of Dataflow, that explains the big
>> difference on time), with the goal of detecting regressions better,
>> the current size is so small that adding 1s of extra time in some runs
>> looks like a 50-60% degradation and we cannot know if this is due to
>> some small small CPU/GC pause or a real regression. I wonder however
>> if this will impact negatively the worker utilization.
>>
>>
>> On Mon, Jul 27, 2020 at 4:07 PM Damian Gadomski
>>  wrote:
>> >
>> > Hey all,
>> >
>> > I've done a few checks to pinpoint the issue and it seems that I've
>> just fixed it.
>> >
>> > Didn't know that before but the Flink, Spark and Direct Nexmark tests
>> are running on special Jenkins worker. The `apache-beam-jenkins-16` is
>> labeled with `beam-perf`, so only these tests can execute there. I'm not
>> sure, because the configuration on the old CI is already gone, but I guess
>> that this worker was configured to have only one executor (which I had
>> missed). That would forbid concurrent execution of the jobs and
>> improve/stabilize the timings.
>> >
>> > That's how I currently configured the node and seems that the timings
>> are back to the pre-migration values:
>> http://104.154.241.245/d/ahuaA_zGz/nexmark?orgId=1=no:w-90d=now
>> >
>> > Dataflow was not affected because it wasn't restricted to run on
>> `apache-beam-jenkins-16`.
>> >
>> > Regards,
>> > Damian
>> >
>> >
>> > On Wed, Jul 22, 2020 at 5:11 PM Kenneth Knowles 
>> wrote:
>> >>
>> >> Are Spark and Flink runners benchmarking against local clusters on the
>> Jenkins VMs? Needless to say that is not a very controlled environment (and
>> of course not realistic scale). That is probably why Dataflow was not
>> affected. Is it possible that simply the different version of the Jenkins
>> worker software and/or the instructions from the Cloudbees instance cause
>> differing load?
>> >>
>> >> Kenn
>> >>
>> >> On Tue, Jul 21, 2020 at 4:17 PM Valentyn Tymofieiev <
>> valen...@google.com> wrote:
>> >>>
>> >>> FYI it looks like the transition to new Jenkins CI is visible on
>> Nexmark performance graphs[1][2]. Are new VM nodes less performant than old
>> ones?
>> >>>
>> >>> [1] hhttp://
>> 104.154.241.245/d/ahuaA_zGz/nexmark?orgId=1=1587597387737=1595373387737=batch=All=All
>> >>> [2]
>> https://issues.apache.org/jira/browse/BEAM-10542?focusedCommentId=17162374=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17162374
>> >>>
>> >>> On Thu, Jun 18, 2020 at 3:32 PM Tyson Hamilton 
>> wrote:
>> >>&g

Re: No space left on device - beam-jenkins 1 and 7

2020-07-28 Thread Damian Gadomski
gt;>>2. Configure your jobs to only keep 5 or 10 previous
>> builds.
>> > >>>>>>>>>>>>3. Configure your jobs to only keep 5 or 10 previous
>> > >>>>>>>>>>>>artifacts.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> [1]:
>> > >>>>>>>>>>>>
>> https://cwiki.apache.org/confluence/display/INFRA/Disk+Space+cleanup+of+Jenkins+nodes
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> On Wed, Jul 22, 2020 at 8:06 AM Kenneth Knowles <
>> > >>>>>>>>>>>> k...@apache.org> wrote:
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>> Those file listings look like the result of using standard
>> > >>>>>>>>>>>>> temp file APIs but with TMPDIR set to /tmp.
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> On Mon, Jul 20, 2020 at 7:55 PM Tyson Hamilton <
>> > >>>>>>>>>>>>> tyso...@google.com> wrote:
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> Jobs are hermetic as far as I can tell and use unique
>> > >>>>>>>>>>>>>> subdirectories inside of /tmp. Here is a quick look into
>> two examples:
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> @apache-ci-beam-jenkins-4:/tmp$ sudo du -ah --time . |
>> sort
>> > >>>>>>>>>>>>>> -rhk 1,1 | head -n 20
>> > >>>>>>>>>>>>>> 1.6G2020-07-21 02:25.
>> > >>>>>>>>>>>>>> 242M2020-07-17 18:48
>> ./beam-pipeline-temp3ybuY4
>> > >>>>>>>>>>>>>> 242M2020-07-17 18:46
>> ./beam-pipeline-tempuxjiPT
>> > >>>>>>>>>>>>>> 242M2020-07-17 18:44
>> ./beam-pipeline-tempVpg1ME
>> > >>>>>>>>>>>>>> 242M2020-07-17 18:42
>> ./beam-pipeline-tempJ4EpyB
>> > >>>>>>>>>>>>>> 242M2020-07-17 18:39
>> ./beam-pipeline-tempepea7Q
>> > >>>>>>>>>>>>>> 242M2020-07-17 18:35
>> ./beam-pipeline-temp79qot2
>> > >>>>>>>>>>>>>> 236M2020-07-17 18:48
>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmpy_Ytzz
>> > >>>>>>>>>>>>>> 236M2020-07-17 18:46
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpN5_UfJ
>> > >>>>>>>>>>>>>> 236M2020-07-17 18:44
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpxSm8pX
>> > >>>>>>>>>>>>>> 236M2020-07-17 18:42
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempJ4EpyB/tmpMZJU76
>> > >>>>>>>>>>>>>> 236M2020-07-17 18:39
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempepea7Q/tmpWy1vWX
>> > >>>>>>>>>>>>>> 236M2020-07-17 18:35
>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp79qot2/tmpvN7vWA
>> > >>>>>>>>>>>>>> 3.7M2020-07-17 18:48
>> > >>>>>>>>>>>>>>  ./beam-pipeline-temp3ybuY4/tmprlh_di
>> > >>>>>>>>>>>>>> 3.7M2020-07-17 18:46
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempuxjiPT/tmpLmVWfe
>> > >>>>>>>>>>>>>> 3.7M2020-07-17 18:44
>> > >>>>>>>>>>>>>>  ./beam-pipeline-tempVpg1ME/tmpvrxbY7
>> > >>>>>>>>>>

Re: Beam Jenkins Migration

2020-07-27 Thread Damian Gadomski
Hey all,

I've done a few checks to pinpoint the issue and it seems that I've just
fixed it.

Didn't know that before but the Flink, Spark and Direct Nexmark tests are
running on special Jenkins worker. The `apache-beam-jenkins-16` is labeled
with `beam-perf`, so only these tests can execute there. I'm not sure,
because the configuration on the old CI is already gone, but I guess that
this worker was configured to have only one executor (which I had missed).
That would forbid concurrent execution of the jobs and improve/stabilize
the timings.

That's how I currently configured the node and seems that the timings are
back to the pre-migration values:
http://104.154.241.245/d/ahuaA_zGz/nexmark?orgId=1=no:w-90d=now
<http://104.154.241.245/d/ahuaA_zGz/nexmark?orgId=1=now-90d=now>

Dataflow was not affected because it wasn't restricted to run on
`apache-beam-jenkins-16`.

Regards,
Damian


On Wed, Jul 22, 2020 at 5:11 PM Kenneth Knowles  wrote:

> Are Spark and Flink runners benchmarking against local clusters on the
> Jenkins VMs? Needless to say that is not a very controlled environment (and
> of course not realistic scale). That is probably why Dataflow was not
> affected. Is it possible that simply the different version of the Jenkins
> worker software and/or the instructions from the Cloudbees instance cause
> differing load?
>
> Kenn
>
> On Tue, Jul 21, 2020 at 4:17 PM Valentyn Tymofieiev 
> wrote:
>
>> FYI it looks like the transition to new Jenkins CI is visible on Nexmark
>> performance graphs[1][2]. Are new VM nodes less performant than old ones?
>>
>> [1] hhttp://
>> 104.154.241.245/d/ahuaA_zGz/nexmark?orgId=1=1587597387737=1595373387737=batch=All=All
>> [2]
>> https://issues.apache.org/jira/browse/BEAM-10542?focusedCommentId=17162374=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17162374
>>
>> On Thu, Jun 18, 2020 at 3:32 PM Tyson Hamilton 
>> wrote:
>>
>>> Currently no. We're already experiencing a backlog of builds so the
>>> additional load would be a problem. I've opened two related issues that I
>>> think need completion before allowing non-committers to trigger tests:
>>>
>>> Load sharing improvements:
>>> https://issues.apache.org/jira/browse/BEAM-10281
>>> Admin access (maybe not required but nice to have):
>>> https://issues.apache.org/jira/browse/BEAM-10280
>>>
>>> I created https://issues.apache.org/jira/browse/BEAM-10282 to track
>>> opening up triggering for non-committers.
>>>
>>> On Thu, Jun 18, 2020 at 3:30 PM Luke Cwik  wrote:
>>>
>>>> Was about to ask the same question, so can non-committers trigger the
>>>> tests now?
>>>>
>>>> On Thu, Jun 18, 2020 at 11:54 AM Heejong Lee 
>>>> wrote:
>>>>
>>>>> This is awesome. Could non-committers also trigger the test now?
>>>>>
>>>>> On Wed, Jun 17, 2020 at 6:12 AM Damian Gadomski <
>>>>> damian.gadom...@polidea.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Good news, we've just migrated to the new CI:
>>>>>> https://ci-beam.apache.org. As from now beam projects at
>>>>>> builds.apache.org are disabled.
>>>>>>
>>>>>> If you experience any issues with the new setup please let me know,
>>>>>> either here or on ASF slack.
>>>>>>
>>>>>> Regards,
>>>>>> Damian
>>>>>>
>>>>>> On Mon, Jun 15, 2020 at 10:40 PM Damian Gadomski <
>>>>>> damian.gadom...@polidea.com> wrote:
>>>>>>
>>>>>>> Happy to see your positive response :)
>>>>>>>
>>>>>>> @Udi Meiri, Thanks for pointing that out. I've checked it and indeed
>>>>>>> it needs some attention.
>>>>>>>
>>>>>>> There are two things basing on my research:
>>>>>>>
>>>>>>>- data uploaded by performance and load tests by the jobs,
>>>>>>>directly to the influx DB - that should be handled automatically as 
>>>>>>> new
>>>>>>>jobs will upload the same data in the same way
>>>>>>>- data fetched using Jenkins API by the metrics tool
>>>>>>>(syncjenkins.py) - here the situation is a bit more complex as the 
>>>>>>> script
>>>>>>>relies on the build number (it's used actually as a time

Re: Jenkins trigger phrase "run seed job" not working?

2020-07-23 Thread Damian Gadomski
Oh, with our new Jenkins that's not an issue - I have admin access there,
the issue is with checking old CI configuration, GitHub, infra stuff, etc.
FYI, all PMC members as well have admin access to the new CI and can
install plugins.

This 'fetching committers during seed job' solution should not
disable phrases. That will work as expected.

On Thu, Jul 23, 2020 at 10:58 PM Udi Meiri  wrote:

> I have the same issue with Jenkins privileges. There's usually no insight
> to test triggering logic.
> For instance I happen to know that tests won't be started right now
> because Infra is restarting Jenkins to install a plugin, but that's only
> because I opened the ticket.
>
> I think fetching the list of allowed user IDs as part of the seed job is
> okay. Even if this disables phrases we can always manually trigger the seed
> job from the Jenkins UI.
>
> On Thu, Jul 23, 2020 at 1:11 PM Damian Gadomski <
> damian.gadom...@polidea.com> wrote:
>
>> Yes, I thought that whitelisting apache organization will do the trick,
>> but apparently, it doesn't. Actually, it makes sense as we want to allow
>> only beam committers and not all apache committers. I don't know the
>> implications of membership in the apache github organization, but you for
>> instance are not there :) Neither is Ahmet.
>>
>>
>> Therefore there's nothing wrong with the Ghprb plugin, it correctly
>> forbade triggering. From my investigation, the "beam-committers" GitHub
>> team (which is under the apache org) is the list of people that should be
>> allowed. But firstly, you cant whitelist a team with Ghprb. There's a
>> ticket for that, open for 5 years
>> <https://github.com/jenkinsci/ghprb-plugin/issues/160>. I could
>> implement that but, secondly, the team is secret. I can't even see it. Even
>> asfbot doesn't have permission to see it.
>>
>> You may ask, how it worked before, because on the builds.apache.org
>> somehow only committers were allowed to trigger PR builds. It appeared that
>> Infra created a webhook relay. It's configured here
>> <https://github.com/apache/infrastructure-puppet/blob/deployment/modules/gitbox/files/conf/relay.yaml>
>>  and
>> it filters out all the non-committers events. I wish I had known that
>> before as it was also the reason for different issues during the migration.
>> Anyway, it would be hard to use that mechanism in our case as we want to
>> configure it depending on the job.
>>
>>
>> There's a publicly available source of committers list - it's LDAP. I've
>> tested it and it allows anonymous connection and provides the list of the
>> committers as well as the github usernames. My current idea is to read this
>> from LDAP as a part of the seed job and configure the jobs with the apache
>> committers present on the ghprb whitelist.
>>
>>
>> Hope that I didn't miss anything ;) It isn't that easy to investigate
>> that kind of issues with my poor privileges ;)
>>
>>
>> Regards,
>>
>> Damian
>>
>>
>> On Thu, Jul 23, 2020 at 6:52 PM Udi Meiri  wrote:
>>
>>> Thanks Damian! I saw that the config also has this:
>>>   orgWhitelist(['apache'])
>>> Shouldn't that be enough to allow all Apache committers?
>>>
>>> I traced the code for the membership check here:
>>>
>>> https://github.com/jenkinsci/ghprb-plugin/blob/4e86ed47a96a01eeaa51a479ff604252109635f6/src/main/java/org/jenkinsci/plugins/ghprb/GhprbGitHub.java#L27
>>> Is there a way to see these logs?
>>>
>>>
>>> On Thu, Jul 23, 2020 at 7:08 AM Damian Gadomski <
>>> damian.gadom...@polidea.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> You are right, the current behavior is wrong, I'm currently working to
>>>> fix it asap. Our intention was to disable that only for non-committers.
>>>>
>>>> As a workaround, as a committer, you could manually add yourself (your
>>>> GitHub username) to the whitelist of the SeedJob configuration:
>>>> https://ci-beam.apache.org/job/beam_SeedJob/configure
>>>> Then, your comment "Run Seed Job" will trigger the build. I've already
>>>> manually triggered it for you that way.
>>>>
>>>> Of course, it will only work until the seed job gets executed - it will
>>>> then override the whitelist with an empty one.
>>>>
>>>> [image: Selection_408.png]
>>>>
>>>> As a target solution, I'm planning to fetch the list of beam committers
>>>> from LDAP and automatically add them to the whitelist above as a part of
>>>> the seed job. I'll keep you updated about the progress.
>>>>
>>>> Regards,
>>>> Damian
>>>>
>>>>
>>>> On Wed, Jul 22, 2020 at 11:03 PM Ahmet Altay  wrote:
>>>>
>>>>> +Damian Gadomski , it might be related
>>>>> to this change: https://github.com/apache/beam/pull/12319.
>>>>>
>>>>> /cc +Tyson Hamilton 
>>>>>
>>>>> On Wed, Jul 22, 2020 at 1:17 PM Udi Meiri  wrote:
>>>>>
>>>>>> HI,
>>>>>> I'm trying to test a groovy change but I can't seem to trigger the
>>>>>> seed job. It worked yesterday so I'm not sure what changed.
>>>>>>
>>>>>> https://github.com/apache/beam/pull/12326
>>>>>>
>>>>>>


Re: Jenkins trigger phrase "run seed job" not working?

2020-07-23 Thread Damian Gadomski
Yes, I thought that whitelisting apache organization will do the trick, but
apparently, it doesn't. Actually, it makes sense as we want to allow only
beam committers and not all apache committers. I don't know the
implications of membership in the apache github organization, but you for
instance are not there :) Neither is Ahmet.


Therefore there's nothing wrong with the Ghprb plugin, it correctly forbade
triggering. From my investigation, the "beam-committers" GitHub team (which
is under the apache org) is the list of people that should be allowed. But
firstly, you cant whitelist a team with Ghprb. There's a ticket for that,
open for 5 years <https://github.com/jenkinsci/ghprb-plugin/issues/160>. I
could implement that but, secondly, the team is secret. I can't even see
it. Even asfbot doesn't have permission to see it.

You may ask, how it worked before, because on the builds.apache.org somehow
only committers were allowed to trigger PR builds. It appeared that Infra
created a webhook relay. It's configured here
<https://github.com/apache/infrastructure-puppet/blob/deployment/modules/gitbox/files/conf/relay.yaml>
and
it filters out all the non-committers events. I wish I had known that
before as it was also the reason for different issues during the migration.
Anyway, it would be hard to use that mechanism in our case as we want to
configure it depending on the job.


There's a publicly available source of committers list - it's LDAP. I've
tested it and it allows anonymous connection and provides the list of the
committers as well as the github usernames. My current idea is to read this
from LDAP as a part of the seed job and configure the jobs with the apache
committers present on the ghprb whitelist.


Hope that I didn't miss anything ;) It isn't that easy to investigate that
kind of issues with my poor privileges ;)


Regards,

Damian


On Thu, Jul 23, 2020 at 6:52 PM Udi Meiri  wrote:

> Thanks Damian! I saw that the config also has this:
>   orgWhitelist(['apache'])
> Shouldn't that be enough to allow all Apache committers?
>
> I traced the code for the membership check here:
>
> https://github.com/jenkinsci/ghprb-plugin/blob/4e86ed47a96a01eeaa51a479ff604252109635f6/src/main/java/org/jenkinsci/plugins/ghprb/GhprbGitHub.java#L27
> Is there a way to see these logs?
>
>
> On Thu, Jul 23, 2020 at 7:08 AM Damian Gadomski <
> damian.gadom...@polidea.com> wrote:
>
>> Hi,
>>
>> You are right, the current behavior is wrong, I'm currently working to
>> fix it asap. Our intention was to disable that only for non-committers.
>>
>> As a workaround, as a committer, you could manually add yourself (your
>> GitHub username) to the whitelist of the SeedJob configuration:
>> https://ci-beam.apache.org/job/beam_SeedJob/configure
>> Then, your comment "Run Seed Job" will trigger the build. I've already
>> manually triggered it for you that way.
>>
>> Of course, it will only work until the seed job gets executed - it will
>> then override the whitelist with an empty one.
>>
>> [image: Selection_408.png]
>>
>> As a target solution, I'm planning to fetch the list of beam committers
>> from LDAP and automatically add them to the whitelist above as a part of
>> the seed job. I'll keep you updated about the progress.
>>
>> Regards,
>> Damian
>>
>>
>> On Wed, Jul 22, 2020 at 11:03 PM Ahmet Altay  wrote:
>>
>>> +Damian Gadomski , it might be related to
>>> this change: https://github.com/apache/beam/pull/12319.
>>>
>>> /cc +Tyson Hamilton 
>>>
>>> On Wed, Jul 22, 2020 at 1:17 PM Udi Meiri  wrote:
>>>
>>>> HI,
>>>> I'm trying to test a groovy change but I can't seem to trigger the seed
>>>> job. It worked yesterday so I'm not sure what changed.
>>>>
>>>> https://github.com/apache/beam/pull/12326
>>>>
>>>>


Re: Jenkins trigger phrase "run seed job" not working?

2020-07-23 Thread Damian Gadomski
Hi,

You are right, the current behavior is wrong, I'm currently working to fix
it asap. Our intention was to disable that only for non-committers.

As a workaround, as a committer, you could manually add yourself (your
GitHub username) to the whitelist of the SeedJob configuration:
https://ci-beam.apache.org/job/beam_SeedJob/configure
Then, your comment "Run Seed Job" will trigger the build. I've already
manually triggered it for you that way.

Of course, it will only work until the seed job gets executed - it will
then override the whitelist with an empty one.

[image: Selection_408.png]

As a target solution, I'm planning to fetch the list of beam committers
from LDAP and automatically add them to the whitelist above as a part of
the seed job. I'll keep you updated about the progress.

Regards,
Damian


On Wed, Jul 22, 2020 at 11:03 PM Ahmet Altay  wrote:

> +Damian Gadomski , it might be related to
> this change: https://github.com/apache/beam/pull/12319.
>
> /cc +Tyson Hamilton 
>
> On Wed, Jul 22, 2020 at 1:17 PM Udi Meiri  wrote:
>
>> HI,
>> I'm trying to test a groovy change but I can't seem to trigger the seed
>> job. It worked yesterday so I'm not sure what changed.
>>
>> https://github.com/apache/beam/pull/12326
>>
>>


Re: Beam Jenkins Migration

2020-07-22 Thread Damian Gadomski
Hey, thanks for pointing that out. As I replied in the issue, the nodes
should have exactly the same configuration. They are all `n1-highmem-16 (16
vCPUs, 104 GB memory)` - exactly as on the old CI. They were also created
from the same disk images and the disk type is also the same (Standard
persistent disk, 500GB).

We increased the number of workers on the nodes (and therefore the number
of consecutive jobs running on them) but that's unrelated as we did it a
few days after the migration. Performance graphs show an immediate effect.

Regards,
Damian

On Wed, Jul 22, 2020 at 1:17 AM Valentyn Tymofieiev 
wrote:

> FYI it looks like the transition to new Jenkins CI is visible on Nexmark
> performance graphs[1][2]. Are new VM nodes less performant than old ones?
>
> [1] hhttp://
> 104.154.241.245/d/ahuaA_zGz/nexmark?orgId=1=1587597387737=1595373387737=batch=All=All
> [2]
> https://issues.apache.org/jira/browse/BEAM-10542?focusedCommentId=17162374=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17162374
>
> On Thu, Jun 18, 2020 at 3:32 PM Tyson Hamilton  wrote:
>
>> Currently no. We're already experiencing a backlog of builds so the
>> additional load would be a problem. I've opened two related issues that I
>> think need completion before allowing non-committers to trigger tests:
>>
>> Load sharing improvements:
>> https://issues.apache.org/jira/browse/BEAM-10281
>> Admin access (maybe not required but nice to have):
>> https://issues.apache.org/jira/browse/BEAM-10280
>>
>> I created https://issues.apache.org/jira/browse/BEAM-10282 to track
>> opening up triggering for non-committers.
>>
>> On Thu, Jun 18, 2020 at 3:30 PM Luke Cwik  wrote:
>>
>>> Was about to ask the same question, so can non-committers trigger the
>>> tests now?
>>>
>>> On Thu, Jun 18, 2020 at 11:54 AM Heejong Lee  wrote:
>>>
>>>> This is awesome. Could non-committers also trigger the test now?
>>>>
>>>> On Wed, Jun 17, 2020 at 6:12 AM Damian Gadomski <
>>>> damian.gadom...@polidea.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Good news, we've just migrated to the new CI:
>>>>> https://ci-beam.apache.org. As from now beam projects at
>>>>> builds.apache.org are disabled.
>>>>>
>>>>> If you experience any issues with the new setup please let me know,
>>>>> either here or on ASF slack.
>>>>>
>>>>> Regards,
>>>>> Damian
>>>>>
>>>>> On Mon, Jun 15, 2020 at 10:40 PM Damian Gadomski <
>>>>> damian.gadom...@polidea.com> wrote:
>>>>>
>>>>>> Happy to see your positive response :)
>>>>>>
>>>>>> @Udi Meiri, Thanks for pointing that out. I've checked it and indeed
>>>>>> it needs some attention.
>>>>>>
>>>>>> There are two things basing on my research:
>>>>>>
>>>>>>- data uploaded by performance and load tests by the jobs,
>>>>>>directly to the influx DB - that should be handled automatically as 
>>>>>> new
>>>>>>jobs will upload the same data in the same way
>>>>>>- data fetched using Jenkins API by the metrics tool
>>>>>>(syncjenkins.py) - here the situation is a bit more complex as the 
>>>>>> script
>>>>>>relies on the build number (it's used actually as a time reference and
>>>>>>primary key in the DB is created from it). To avoid refactoring of the
>>>>>>script and database migration to use timestamp instead of build 
>>>>>> number I've
>>>>>>just "fast-forwarded" the numbers on the new
>>>>>>https://ci-beam.apache.org to follow current numbering from the
>>>>>>old CI. Therefore simple replacement of the Jenkins URL in the metrics
>>>>>>scripts should do the trick to have continuous metrics data. I'll 
>>>>>> check
>>>>>>that tomorrow on my local grafana instance.
>>>>>>
>>>>>> Please let me know if there's anything that I missed.
>>>>>>
>>>>>> Regards,
>>>>>> Damian
>>>>>>
>>>>>> On Mon, Jun 15, 2020 at 8:05 PM Alexey Romanenko <
>>>>>> aromanenko@gmail.com> wrote:
>>>>>>
>>>>>>> Great! Thank you for wo

Re: No space left on device - beam-jenkins 1 and 7

2020-07-20 Thread Damian Gadomski
Hey,

I've recently created a solution for the growing /tmp directory. Part of it
is the job mentioned by Tyson: *beam_Clean_tmp_directory*. It's
intentionally not triggered by cron and should be a last resort solution
for some strange cases.

Along with that job, I've also updated every worker with an internal cron
script. It's being executed once a week and deletes all the files (and only
files) that were not accessed for at least three days. That's designed to
be as safe as possible for the running jobs on the worker (not to delete
the files that are still in use), and also to be insensitive to the current
workload on the machine. The cleanup will always happen, even if some
long-running/stuck jobs are blocking the machine.

I also think that currently the "No space left" errors may be a consequence
of growing workspace directory rather than /tmp. I didn't do any detailed
analysis but e.g. currently, on apache-beam-jenkins-7 the workspace
directory size is 158 GB while /tmp is only 16 GB. We should either
guarantee the disk size to hold workspaces for all jobs (because
eventually, every worker will execute each job) or clear also the
workspaces in some way.

Regards,
Damian


On Mon, Jul 20, 2020 at 10:43 AM Maximilian Michels  wrote:

> +1 for scheduling it via a cron job if it won't lead to test failures
> while running. Not a Jenkins expert but maybe there is the notion of
> running exclusively while no other tasks are running?
>
> -Max
>
> On 17.07.20 21:49, Tyson Hamilton wrote:
> > FYI there was a job introduced to do this in Jenkins:
> beam_Clean_tmp_directory
> >
> > Currently it needs to be run manually. I'm seeing some out of disk
> related errors in precommit tests currently, perhaps we should schedule
> this job with cron?
> >
> >
> > On 2020/03/11 19:31:13, Heejong Lee  wrote:
> >> Still seeing no space left on device errors on jenkins-7 (for example:
> >> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/2754/)
> >>
> >>
> >> On Fri, Mar 6, 2020 at 7:11 PM Alan Myrvold 
> wrote:
> >>
> >>> Did a one time cleanup of tmp files owned by jenkins older than 3 days.
> >>> Agree that we need a longer term solution.
> >>>
> >>> Passing recent tests on all executors except jenkins-12, which has not
> >>> scheduled recent builds for the past 13 days. Not scheduling:
> >>> https://builds.apache.org/computer/apache-beam-jenkins-12/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-12/builds=D
> >
> >>> Recent passing builds:
> >>> https://builds.apache.org/computer/apache-beam-jenkins-1/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-1/builds=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-2/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-2/builds=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-3/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-3/builds=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-4/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-4/builds=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-5/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-5/builds=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-6/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-6/builds=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-7/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-7/builds=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-8/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-8/builds=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-9/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-9/builds=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-10/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-10/builds=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-11/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-11/builds=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-13/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-13/builds=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-14/builds
> >>> <
> https://www.google.com/url?q=https://builds.apache.org/computer/apache-beam-jenkins-14/builds=D
> >
> >>> https://builds.apache.org/computer/apache-beam-jenkins-15/builds
> >>> <
> 

contributor permission for Beam Jira tickets

2020-07-09 Thread Damian Gadomski
Hi,

Can I be added to the JIRA contributors so I can assign tickets to myself,
please?

my Jira username: damgad

Thanks,
Damian


cwiki edit access

2020-06-19 Thread Damian Gadomski
Hi,

I would like to have write (edit) access on the cwiki. I'm planning to
update the Jenkins URLs as we've migrated recently.

The username is: dgadomski

Thanks,
Damian


Re: Beam Jenkins Migration

2020-06-17 Thread Damian Gadomski
Hello,

Good news, we've just migrated to the new CI: https://ci-beam.apache.org.
As from now beam projects at builds.apache.org are disabled.

If you experience any issues with the new setup please let me know, either
here or on ASF slack.

Regards,
Damian

On Mon, Jun 15, 2020 at 10:40 PM Damian Gadomski <
damian.gadom...@polidea.com> wrote:

> Happy to see your positive response :)
>
> @Udi Meiri, Thanks for pointing that out. I've checked it and indeed it
> needs some attention.
>
> There are two things basing on my research:
>
>- data uploaded by performance and load tests by the jobs, directly to
>the influx DB - that should be handled automatically as new jobs will
>upload the same data in the same way
>- data fetched using Jenkins API by the metrics tool (syncjenkins.py)
>- here the situation is a bit more complex as the script relies on the
>build number (it's used actually as a time reference and primary key in the
>DB is created from it). To avoid refactoring of the script and database
>migration to use timestamp instead of build number I've just
>"fast-forwarded" the numbers on the new https://ci-beam.apache.org to
>follow current numbering from the old CI. Therefore simple replacement of
>the Jenkins URL in the metrics scripts should do the trick to have
>continuous metrics data. I'll check that tomorrow on my local grafana
>instance.
>
> Please let me know if there's anything that I missed.
>
> Regards,
> Damian
>
> On Mon, Jun 15, 2020 at 8:05 PM Alexey Romanenko 
> wrote:
>
>> Great! Thank you for working on this and letting us know.
>>
>> On 12 Jun 2020, at 16:58, Damian Gadomski 
>> wrote:
>>
>> Hello,
>>
>> During the last few days, I was preparing for the Beam Jenkins migration
>> from builds.apache.org to ci-beam.apache.org. The new Jenkins Master
>> will be dedicated only for Beam related jobs, all Beam Committers will have
>> build configure access, and Beam PMC will have Admin (GUI) Access.
>>
>> We (in cooperation with Infra) are almost ready for the migration itself
>> and I want to share with you the details of our plan. We are planning to
>> start the migration next week, most likely on Tuesday. I'll keep you
>> updated on the progress. We do not expect any issues nor the outage of the
>> CI services, everything should be more or less unnoticeable. Just don't be
>> surprised that the Jenkins URL will change to https://ci-beam.apache.org
>>
>> If you are curious, here are the steps that we are going to take:
>>
>> 1. Create 16 new CI nodes that will be connected to the new CI. We will
>> then have simultaneously running two CI servers.
>> 2. Verify that new builds work as expected on the new instance (compare
>> results of cron builds). (a day or two would be sufficient)
>> 3. Move the responsibility of Phrase/PR/Commit builds to the new CI,
>> disable on the old one.
>> 4. Modify the .test-infra/jenkins/README.md to point to the new instance
>> and replace Post-commit tests status in README.md and
>> .github/PULL_REQUEST_TEMPLATE.md
>> 5. Disable the jobs on the old Jenkins and add a description to each job
>> with the URL to the corresponding one on the new CI.
>> 6. Turn off VM instances of the old nodes.
>> 7. Remove VM instances of the old nodes.
>>
>> In case of any questions or doubts feel free to ask :)
>>
>> Regards,
>> Damian
>>
>>
>>


Re: Beam Jenkins Migration

2020-06-15 Thread Damian Gadomski
Happy to see your positive response :)

@Udi Meiri, Thanks for pointing that out. I've checked it and indeed it
needs some attention.

There are two things basing on my research:

   - data uploaded by performance and load tests by the jobs, directly to
   the influx DB - that should be handled automatically as new jobs will
   upload the same data in the same way
   - data fetched using Jenkins API by the metrics tool (syncjenkins.py) -
   here the situation is a bit more complex as the script relies on the build
   number (it's used actually as a time reference and primary key in the DB is
   created from it). To avoid refactoring of the script and database migration
   to use timestamp instead of build number I've just "fast-forwarded" the
   numbers on the new https://ci-beam.apache.org to follow current
   numbering from the old CI. Therefore simple replacement of the Jenkins URL
   in the metrics scripts should do the trick to have continuous metrics data.
   I'll check that tomorrow on my local grafana instance.

Please let me know if there's anything that I missed.

Regards,
Damian

On Mon, Jun 15, 2020 at 8:05 PM Alexey Romanenko 
wrote:

> Great! Thank you for working on this and letting us know.
>
> On 12 Jun 2020, at 16:58, Damian Gadomski 
> wrote:
>
> Hello,
>
> During the last few days, I was preparing for the Beam Jenkins migration
> from builds.apache.org to ci-beam.apache.org. The new Jenkins Master will
> be dedicated only for Beam related jobs, all Beam Committers will have
> build configure access, and Beam PMC will have Admin (GUI) Access.
>
> We (in cooperation with Infra) are almost ready for the migration itself
> and I want to share with you the details of our plan. We are planning to
> start the migration next week, most likely on Tuesday. I'll keep you
> updated on the progress. We do not expect any issues nor the outage of the
> CI services, everything should be more or less unnoticeable. Just don't be
> surprised that the Jenkins URL will change to https://ci-beam.apache.org
>
> If you are curious, here are the steps that we are going to take:
>
> 1. Create 16 new CI nodes that will be connected to the new CI. We will
> then have simultaneously running two CI servers.
> 2. Verify that new builds work as expected on the new instance (compare
> results of cron builds). (a day or two would be sufficient)
> 3. Move the responsibility of Phrase/PR/Commit builds to the new CI,
> disable on the old one.
> 4. Modify the .test-infra/jenkins/README.md to point to the new instance
> and replace Post-commit tests status in README.md and
> .github/PULL_REQUEST_TEMPLATE.md
> 5. Disable the jobs on the old Jenkins and add a description to each job
> with the URL to the corresponding one on the new CI.
> 6. Turn off VM instances of the old nodes.
> 7. Remove VM instances of the old nodes.
>
> In case of any questions or doubts feel free to ask :)
>
> Regards,
> Damian
>
>
>


Beam Jenkins Migration

2020-06-12 Thread Damian Gadomski
Hello,

During the last few days, I was preparing for the Beam Jenkins migration
from builds.apache.org to ci-beam.apache.org. The new Jenkins Master will
be dedicated only for Beam related jobs, all Beam Committers will have
build configure access, and Beam PMC will have Admin (GUI) Access.

We (in cooperation with Infra) are almost ready for the migration itself
and I want to share with you the details of our plan. We are planning to
start the migration next week, most likely on Tuesday. I'll keep you
updated on the progress. We do not expect any issues nor the outage of the
CI services, everything should be more or less unnoticeable. Just don't be
surprised that the Jenkins URL will change to https://ci-beam.apache.org

If you are curious, here are the steps that we are going to take:

1. Create 16 new CI nodes that will be connected to the new CI. We will
then have simultaneously running two CI servers.
2. Verify that new builds work as expected on the new instance (compare
results of cron builds). (a day or two would be sufficient)
3. Move the responsibility of Phrase/PR/Commit builds to the new CI,
disable on the old one.
4. Modify the .test-infra/jenkins/README.md to point to the new instance
and replace Post-commit tests status in README.md and
.github/PULL_REQUEST_TEMPLATE.md
5. Disable the jobs on the old Jenkins and add a description to each job
with the URL to the corresponding one on the new CI.
6. Turn off VM instances of the old nodes.
7. Remove VM instances of the old nodes.

In case of any questions or doubts feel free to ask :)

Regards,
Damian