Hi,
As a first reaction to this and to increase our benefit from Jenkins I
disabled email notifications and Jira issue reporting for our Jenkins
Matrix jobs [1, 2]. The jobs are still there and I suggest everyone has
a look at them once in a while.
At the same time I set up a new Jenkins jobs, which is much lighter as
it only runs the unit tests on trunk [3]. The job is triggered at every
commit and usually completes after about 25 minutes. Currently the job
sends a notification to @oak-dev should it fail and I might experiment
with adding Jira issue reporting to it (but might hit INFRA-13599 [4]).
So far this jobs has proofed very stable (no failures in the past 20
builds). The stability together with the quick turnaround should give us
fast feedback on regressions. Any failure reported by this job is thus a
signal for immediate action.
Michael
[1]
https://builds.apache.org/view/All/job/Apache%20Jackrabbit%20Oak%20matrix/
[2] https://builds.apache.org/view/All/job/Oak-Win/
[3] https://builds.apache.org/view/J/job/Jackrabbit%20Oak/
[4] https://issues.apache.org/jira/browse/INFRA-13599
On 28.02.17 12:31, Michael Dürig wrote:
Hi,
To get an overview on what is going on with our Jenkins instances, what
value they provide and how much effort they generate, I broke down the
issues reported by them along various axis.
There where 327 issues reported between 8.12.16 and 28.2.17. With 82
days this amounts to almost 4 issues a day. Note that this number is
quite biased as that time period includes the Christmas break where we
didn't had much activity. The correct numbers are probably closer to 72
days and and 4.5 issues per day.
To me the most striking thing in below breakdowns are the high number of
duplicates (256 / 78%) and the high number of infrastructure relates
issues (84 / 26%). To me this means we are spending too much time in
triaging issues and hunting down infrastructure problems.
From the total of 25 fixed issues only 4 where actual regressions. Two
of which were caused by missing licenses headers, a problem that our
release process also would have caught.
Finally all numbers are further biased because the Jenkins Jira
notification plugin itself fails sometimes [1] (frequently?), which
causes build failures not to be reported.
Michael
Issues by resolution:
256 Duplicates (172 test failures / 84 infra issues)
27 Unresolved ( 21 test failures / 6 infra issues)
25 Fixed
15 CI and infra issue
4 Rare test artefacts
Infra issues (84):
32 Backing channel disconnected
20 JVM crash
12 File name too long
6 Failed silently
4 Artifact resolution error
4 Maven failure
3 Timeout (120 min.)
2 Disk full
1 Checksum mismatch
Fixed issues (25):
4 bug / regression (OAK-5339, OAK-5540, OAK-5241, OAK-5471)
7 timing
14 test artefact
[1]
ERROR: Build step failed with exception
java.lang.NullPointerException
at
hudson.plugins.jira.JiraCreateIssueNotifier.getStatus(JiraCreateIssueNotifier.java:218)
at
hudson.plugins.jira.JiraCreateIssueNotifier.currentBuildResultSuccess(JiraCreateIssueNotifier.java:387)
at
hudson.plugins.jira.JiraCreateIssueNotifier.perform(JiraCreateIssueNotifier.java:159)
at
hudson.tasks.BuildStepMonitor$3.perform(BuildStepMonitor.java:45)
at
hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:779)
at
hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:720)
at hudson.model.Build$BuildExecution.post2(Build.java:185)
at
hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:665)
at hudson.model.Run.execute(Run.java:1753)
at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
at
hudson.model.ResourceController.execute(ResourceController.java:98)
at hudson.model.Executor.run(Executor.java:404)
Build step 'JIRA: Create issue' marked build as failure
Finished: FAILURE