Re: [Development] Qt CI reliability

2016-05-09 Thread Jędrzej Nowacki
> Determining architecture... ()
> make: Warning: File `../../../qt/qtbase/config.tests/arch/arch.pro' has
> modification time 1.3e+03 s in the future
> /home/qt/work/build/bin/qmake -qtconf /home/qt/work/build/bin/qt.conf
> -nocache -spec /home/qt/work/qt/qtbase/mkspecs/linux-g++ LIBS+=
> QMAKE_CXXFLAGS+= INCLUDEPATH+= CONFIG-=app_bundle -o Makefile
> ../../../qt/qtbase/config.tests/arch/arch.pro
> ...
> 
> agent:2016/05/05 13:07:20 build.go:221: Killed process after timeout (total
> time)
> agent:2016/05/05 13:07:20 agent.go:170: Build failed
> agent:2016/05/05 13:07:20 agent.go:127: ERROR building: Timeout after 5m0s:
> Maximum duration reached
> agent:2016/05/05 13:07:20 build.go:158: Error reading from stdout/err:
> Timeout after 5m0s: Maximum duration reached
> 
> Sean

Date and time on nodes was out of sync. Now they are still out of sync but it 
should not affect the build anymore, while unziping source code we change 
ctime and atime 30 years to the past. The "fix" is deployed, if you see the 
problem again ping me.

Cheers,
 Jędrek
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] Qt CI reliability

2016-05-05 Thread Tuukka Turunen


> -Original Message-
> From: Development [mailto:development-
> bounces+tuukka.turunen=qt...@qt-project.org] On Behalf Of Sean Harmer
> Sent: torstaina 5. toukokuuta 2016 15.19
> To: development@qt-project.org
> Subject: Re: [Development] Qt CI reliability
> 
> On Thursday 05 May 2016 13:01:24 Sean Harmer wrote:
> > And more problems today:
> >
> > http://testresults.qt.io/logs/qt/qt3d/83275e6eb72baa5f80d4fbaaf84bc0ff
> > d1b597
> >
> 62/LinuxUbuntu_15_04x86_64LinuxUbuntuTouch_15_04armv7GCCDisableTe
> sts_O
> > penGLE S2/f09f2056189e7df3613966a77c6b671ff3d5ea88/buildlog.txt.gz
> >
> > results in a 404. I hope this doesn't last all of the holiday weekend...
> 
> And now another qt3d integration fails with:
> 
> Module "qt/qtdeclarative" (9a7cf067a178c7a08a7ed9f2c6253e1feade5569)
> did not
> compile:
>  Could not parse build log :(
> 
> Is anybody able to poke the system please to see what's going wrong?
> 

I am afraid that it is again a time of public holiday in all our main R 
sites. It may be that someone is working today, but most developers are away. I 
do fully agree with you that such things should not happen in the first place.

Yours,

Tuukka

> Sean
> 
> >
> > Sean
> >
> > On Thursday 05 May 2016 11:15:38 Sean Harmer wrote:
> > > On Tuesday 03 May 2016 17:36:49 Jędrzej Nowacki wrote:
> > > > On Tuesday 03 of May 2016 14:54:50 Sean Harmer wrote:
> > > > > Do you mean VM template? If so then yes that's again something
> > > > > that should ideally be verified before deployment.
> > > >
> > > > Eh, wrong phrasing. Templates are tested. The problem is that the
> > > > current process is a bit racy. Updating template is a work that
> > > > require time, especially for testing and deployment. Regressions
> > > > may appear during that time window. Anyway the process is about to
> > > > be changed soon.
> > >
> > > Here's another new failure mode:
> > >
> > > http://testresults.qt.io/logs/qt/qtbase/3356adaae5186a77c2aec458e2a1
> > > ebd3e4
> > > 0d
> > >
> b8ab/LinuxUbuntu_14_04x86_64LinuxUbuntu_14_04x86_64GCCDeveloperB
> uild
> > > _OutO
> > > fSo
> > >
> urceBuild_QtLibInfix_QtNamespace/85d6b000f945a84bc84a4f01f53ac65bc05
> > > cbe86
> > > /bu ildlog.txt.gz
> > >
> > > Failing with:
> > >
> > > Determining architecture... ()
> > > make: Warning: File `../../../qt/qtbase/config.tests/arch/arch.pro'
> > > has modification time 1.3e+03 s in the future
> > > /home/qt/work/build/bin/qmake -qtconf
> > > /home/qt/work/build/bin/qt.conf -nocache -spec
> > > /home/qt/work/qt/qtbase/mkspecs/linux-g++ LIBS+=
> QMAKE_CXXFLAGS+=
> > > INCLUDEPATH+= CONFIG-=app_bundle -o Makefile
> > > ../../../qt/qtbase/config.tests/arch/arch.pro
> > > ...
> > >
> > > agent:2016/05/05 13:07:20 build.go:221: Killed process after timeout
> > > (total
> > > time)
> > > agent:2016/05/05 13:07:20 agent.go:170: Build failed
> > > agent:2016/05/05 13:07:20 agent.go:127: ERROR building: Timeout
> > > after
> > > 5m0s:
> > > Maximum duration reached
> > > agent:2016/05/05 13:07:20 build.go:158: Error reading from stdout/err:
> > > Timeout after 5m0s: Maximum duration reached
> > >
> > > Sean
> 
> --
> Dr Sean Harmer | sean.har...@kdab.com | Managing Director UK KDAB
> (UK) Ltd, a KDAB Group company Tel. +44 (0)1625 809908; Sweden (HQ) +46-
> 563-540090
> Mobile: +44 (0)7545 140604
> KDAB - Qt Experts
> ___
> Development mailing list
> Development@qt-project.org
> http://lists.qt-project.org/mailman/listinfo/development
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] Qt CI reliability

2016-05-05 Thread Sean Harmer
On Thursday 05 May 2016 13:01:24 Sean Harmer wrote:
> And more problems today:
> 
> http://testresults.qt.io/logs/qt/qt3d/83275e6eb72baa5f80d4fbaaf84bc0ffd1b597
> 62/LinuxUbuntu_15_04x86_64LinuxUbuntuTouch_15_04armv7GCCDisableTests_OpenGLE
> S2/f09f2056189e7df3613966a77c6b671ff3d5ea88/buildlog.txt.gz
> 
> results in a 404. I hope this doesn't last all of the holiday weekend...

And now another qt3d integration fails with:

Module "qt/qtdeclarative" (9a7cf067a178c7a08a7ed9f2c6253e1feade5569) did not 
compile:
 Could not parse build log :(

Is anybody able to poke the system please to see what's going wrong?

Sean

> 
> Sean
> 
> On Thursday 05 May 2016 11:15:38 Sean Harmer wrote:
> > On Tuesday 03 May 2016 17:36:49 Jędrzej Nowacki wrote:
> > > On Tuesday 03 of May 2016 14:54:50 Sean Harmer wrote:
> > > > Do you mean VM template? If so then yes that's again something that
> > > > should
> > > > ideally be verified before deployment.
> > > 
> > > Eh, wrong phrasing. Templates are tested. The problem is that the
> > > current
> > > process is a bit racy. Updating template is a work that require time,
> > > especially for testing and deployment. Regressions may appear during
> > > that
> > > time window. Anyway the process is about to be changed soon.
> > 
> > Here's another new failure mode:
> > 
> > http://testresults.qt.io/logs/qt/qtbase/3356adaae5186a77c2aec458e2a1ebd3e4
> > 0d
> > b8ab/LinuxUbuntu_14_04x86_64LinuxUbuntu_14_04x86_64GCCDeveloperBuild_OutO
> > fSo
> > urceBuild_QtLibInfix_QtNamespace/85d6b000f945a84bc84a4f01f53ac65bc05cbe86
> > /bu ildlog.txt.gz
> > 
> > Failing with:
> > 
> > Determining architecture... ()
> > make: Warning: File `../../../qt/qtbase/config.tests/arch/arch.pro' has
> > modification time 1.3e+03 s in the future
> > /home/qt/work/build/bin/qmake -qtconf /home/qt/work/build/bin/qt.conf
> > -nocache -spec /home/qt/work/qt/qtbase/mkspecs/linux-g++ LIBS+=
> > QMAKE_CXXFLAGS+= INCLUDEPATH+= CONFIG-=app_bundle -o Makefile
> > ../../../qt/qtbase/config.tests/arch/arch.pro
> > ...
> > 
> > agent:2016/05/05 13:07:20 build.go:221: Killed process after timeout
> > (total
> > time)
> > agent:2016/05/05 13:07:20 agent.go:170: Build failed
> > agent:2016/05/05 13:07:20 agent.go:127: ERROR building: Timeout after
> > 5m0s:
> > Maximum duration reached
> > agent:2016/05/05 13:07:20 build.go:158: Error reading from stdout/err:
> > Timeout after 5m0s: Maximum duration reached
> > 
> > Sean

-- 
Dr Sean Harmer | sean.har...@kdab.com | Managing Director UK
KDAB (UK) Ltd, a KDAB Group company
Tel. +44 (0)1625 809908; Sweden (HQ) +46-563-540090
Mobile: +44 (0)7545 140604
KDAB - Qt Experts
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] Qt CI reliability

2016-05-05 Thread Sean Harmer
And more problems today:

http://testresults.qt.io/logs/qt/qt3d/83275e6eb72baa5f80d4fbaaf84bc0ffd1b59762/LinuxUbuntu_15_04x86_64LinuxUbuntuTouch_15_04armv7GCCDisableTests_OpenGLES2/f09f2056189e7df3613966a77c6b671ff3d5ea88/buildlog.txt.gz

results in a 404. I hope this doesn't last all of the holiday weekend...

Sean

On Thursday 05 May 2016 11:15:38 Sean Harmer wrote:
> On Tuesday 03 May 2016 17:36:49 Jędrzej Nowacki wrote:
> > On Tuesday 03 of May 2016 14:54:50 Sean Harmer wrote:
> > > Do you mean VM template? If so then yes that's again something that
> > > should
> > > ideally be verified before deployment.
> > 
> > Eh, wrong phrasing. Templates are tested. The problem is that the current
> > process is a bit racy. Updating template is a work that require time,
> > especially for testing and deployment. Regressions may appear during that
> > time window. Anyway the process is about to be changed soon.
> 
> Here's another new failure mode:
> 
> http://testresults.qt.io/logs/qt/qtbase/3356adaae5186a77c2aec458e2a1ebd3e40d
> b8ab/LinuxUbuntu_14_04x86_64LinuxUbuntu_14_04x86_64GCCDeveloperBuild_OutOfSo
> urceBuild_QtLibInfix_QtNamespace/85d6b000f945a84bc84a4f01f53ac65bc05cbe86/bu
> ildlog.txt.gz
> 
> Failing with:
> 
> Determining architecture... ()
> make: Warning: File `../../../qt/qtbase/config.tests/arch/arch.pro' has
> modification time 1.3e+03 s in the future
> /home/qt/work/build/bin/qmake -qtconf /home/qt/work/build/bin/qt.conf
> -nocache -spec /home/qt/work/qt/qtbase/mkspecs/linux-g++ LIBS+=
> QMAKE_CXXFLAGS+= INCLUDEPATH+= CONFIG-=app_bundle -o Makefile
> ../../../qt/qtbase/config.tests/arch/arch.pro
> ...
> 
> agent:2016/05/05 13:07:20 build.go:221: Killed process after timeout (total
> time)
> agent:2016/05/05 13:07:20 agent.go:170: Build failed
> agent:2016/05/05 13:07:20 agent.go:127: ERROR building: Timeout after 5m0s:
> Maximum duration reached
> agent:2016/05/05 13:07:20 build.go:158: Error reading from stdout/err:
> Timeout after 5m0s: Maximum duration reached
> 
> Sean

-- 
Dr Sean Harmer | sean.har...@kdab.com | Managing Director UK
KDAB (UK) Ltd, a KDAB Group company
Tel. +44 (0)1625 809908; Sweden (HQ) +46-563-540090
Mobile: +44 (0)7545 140604
KDAB - Qt Experts
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] Qt CI reliability

2016-05-05 Thread Sean Harmer
On Tuesday 03 May 2016 17:36:49 Jędrzej Nowacki wrote:
> On Tuesday 03 of May 2016 14:54:50 Sean Harmer wrote: 
> > Do you mean VM template? If so then yes that's again something that should
> > ideally be verified before deployment.
> 
> Eh, wrong phrasing. Templates are tested. The problem is that the current
> process is a bit racy. Updating template is a work that require time,
> especially for testing and deployment. Regressions may appear during that
> time window. Anyway the process is about to be changed soon.

Here's another new failure mode:

http://testresults.qt.io/logs/qt/qtbase/3356adaae5186a77c2aec458e2a1ebd3e40db8ab/LinuxUbuntu_14_04x86_64LinuxUbuntu_14_04x86_64GCCDeveloperBuild_OutOfSourceBuild_QtLibInfix_QtNamespace/85d6b000f945a84bc84a4f01f53ac65bc05cbe86/buildlog.txt.gz

Failing with:

Determining architecture... ()
make: Warning: File `../../../qt/qtbase/config.tests/arch/arch.pro' has 
modification time 1.3e+03 s in the future
/home/qt/work/build/bin/qmake -qtconf /home/qt/work/build/bin/qt.conf -nocache 
-spec /home/qt/work/qt/qtbase/mkspecs/linux-g++ LIBS+= QMAKE_CXXFLAGS+= 
INCLUDEPATH+= CONFIG-=app_bundle -o Makefile 
../../../qt/qtbase/config.tests/arch/arch.pro
...

agent:2016/05/05 13:07:20 build.go:221: Killed process after timeout (total 
time)
agent:2016/05/05 13:07:20 agent.go:170: Build failed
agent:2016/05/05 13:07:20 agent.go:127: ERROR building: Timeout after 5m0s: 
Maximum duration reached
agent:2016/05/05 13:07:20 build.go:158: Error reading from stdout/err: Timeout 
after 5m0s: Maximum duration reached

Sean
-- 
Dr Sean Harmer | sean.har...@kdab.com | Managing Director UK
KDAB (UK) Ltd, a KDAB Group company
Tel. +44 (0)1625 809908; Sweden (HQ) +46-563-540090
Mobile: +44 (0)7545 140604
KDAB - Qt Experts
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] Qt CI reliability

2016-05-03 Thread Tuukka Turunen


> -Original Message-
> From: sean.harmer On Behalf Of Sean Harmer
> Sent: tiistaina 3. toukokuuta 2016 15.45
> To: development@qt-project.org
> Cc: Tuukka Turunen <tuukka.turu...@qt.io>
> Subject: Re: [Development] Qt CI reliability
> 
> 
> So in summary, I appreciate the CI is a big complex beast but it's also the
> gateway to getting contributions in and is therefore critical that it runs 
> all of
> the time, or at least as much of the time as possible. Can you investigate the
> feasibility of putting 24x7 support in place please?
> 

Yes, we can investigate this. But it may not be feasible - at least the level 
of support that can be available outside developer's office hours would most 
likely be just basic infrastructure support. 

It should be noted that we have been (and partially still are) in the process 
of being separated from the Digia group. While the CI part was mainly done 
months ago, there is overall a lot of issues ongoing that stretch the 
capabilities of IT personnel. That is of course no excuse as such, but an item 
to note anyways. 

> > we do have persons
> > dedicated to operate the CI system, as well as support from IT as
> > needed for the infrastructure. In addition, we are still putting a
> > significant effort into developing the CI further and stabilizing it.
> > And, yes, we are also continuously monitoring the infrastructure of
> > the CI systems as well as planning how to improve it further in the future.
> 
> Well, disks filling up over a weekend shows that it isn't monitored
> continuously, or rather, not acted upon. If there isn't 24x7 support then this
> is a valid outcome of course. My argument is that such support is warranted.
> 

Or at least we should make sure disks that are not automatically cleaned, are 
cleaned in advance during the office hours. I do agree with you that running 
out of a disk space should not be a reason for CI failure - and mostly it has 
not been, despite you hitting that over the weekend. 

Yours,

Tuukka

___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] Qt CI reliability

2016-05-03 Thread Jędrzej Nowacki
On Tuesday 03 of May 2016 14:54:50 Sean Harmer wrote:
> On Monday 02 May 2016 11:14:24 Jędrzej Nowacki wrote:
> > On Saturday 30 of April 2016 20:26:20 Sean Harmer wrote:
> (...)
> What would be a *very* useful feature would be if we can trigger a test
> build of a change on a particular configuration for such cases where we
> don't have ready access to a configuration locally.
We have this feature, but it is not exposed. Qt account integration is missing 
as well as some kind of limitation to protect CI resources. If you have a 
really urgent thing to test ping me on IRC. I can run a test run for you.

> > I will ask IT about network, it seems that network interface was
> > re-configured during CI run and DHCP assigned a different IP. It should
> > not
> > happen (TM)
> 
> Yes that sort of thing should be done in a specified window out of hours
> after disabling the CI master to be able to disseminate jobs to the nodes.
Well it was not about maintenance. A node tried to re-new it's IP lease and it 
got a new IP instead of an old one. It may be some kind of DHCP miss-
configuration or operating system failed to ask in time to re-new the IP lease. 
I do not know, how on earth this could happen. I need to access logs.

> > Rule of thumb is: if logs show broken compilation it means: real problem,
> > don't blame CI. There are three main reasons, that I'm aware of, that can
> > cause the problem (sorted according to the probability):
> > 1. One of changes being integrated broke the compilation
> 
> Fine and expected and, with timely failures, not an issue.
> 
> > 2. One of module dependencies broke source compatibility
> 
> This is very rarely an issue, at least for Qt 3D.
> 
> > 3. There was a untested template update (this reason will almost disappear
> > soon)
> 
> Do you mean VM template? If so then yes that's again something that should
> ideally be verified before deployment.
Eh, wrong phrasing. Templates are tested. The problem is that the current 
process is a bit racy. Updating template is a work that require time, 
especially for testing and deployment. Regressions may appear during that time 
window. Anyway the process is about to be changed soon. 

Cheers,
 Jędrek
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] Qt CI reliability

2016-05-03 Thread Sean Harmer
On Monday 02 May 2016 11:14:24 Jędrzej Nowacki wrote:
> On Saturday 30 of April 2016 20:26:20 Sean Harmer wrote:
> > Hi,
> 
> Hi,
> 
> > after yet another 5 hour wait just to be greeted with yet another random
> > failure with no build logs I'm getting really tired of the poor
> > reliability
> > of the Qt CI system.
> 
> I'm sorry about that.
> 
> > https://codereview.qt-project.org/#/c/157590/
> > 
> > has been greeted with genuine failures, failures in qtdeclarative,
> > qtxmlpatterns, multiple random failures in qt3d despite being a simple
> > change which I suspect are due to issues on one or more CI nodes.
> 
> I scanned through the failures and it seems that you had a very bad luck. I
> know CI should not be about luck and therefore I'm really sorry about that.

No need for you to apologize personally, I'm complaining about the policy to 
not have 24x7 support. I don't blame any individual and I know it's still 
improving on the technical front.

> 
> You tried to stage the change 7 times
> 1. Failed to compile qt3d (broken change
> https://codereview.qt-project.org/#/c/157593/)

Yup, such changes are part of normal development, and are expected. These are 
not an issue as long as the CI fails in a timely manner.

What would be a *very* useful feature would be if we can trigger a test build 
of a change on a particular configuration for such cases where we don't have 
ready access to a configuration locally.

> 2. Looks like a network
> connectivity failure, logs were not flushed as they should, so you can
> blame CI
> 3. Blame CI, we failed to acquire a free machine for 5h, I will look at that
> later
> 4. Failed to compile qt3d (broken change
> https://codereview.qt-project.org/#/c/157590/) 5. Failed to compile qt3d
> (broken change https://codereview.qt-project.org/#/c/157590/), same as 4 6.
> Looks like a network connectivity failure, logs were not flushed as they
> should, so you can blame CI, same as 2

Right, any time I find a link that points to nothing for the build logs I'm 
suspicious of the CI.

> 7. Passed
> 
> I will ask IT about network, it seems that network interface was
> re-configured during CI run and DHCP assigned a different IP. It should not
> happen (TM)

Yes that sort of thing should be done in a specified window out of hours after 
disabling the CI master to be able to disseminate jobs to the nodes.

> 
> Rule of thumb is: if logs show broken compilation it means: real problem,
> don't blame CI. There are three main reasons, that I'm aware of, that can
> cause the problem (sorted according to the probability):
> 1. One of changes being integrated broke the compilation

Fine and expected and, with timely failures, not an issue.

> 2. One of module dependencies broke source compatibility

This is very rarely an issue, at least for Qt 3D.

> 3. There was a untested template update (this reason will almost disappear
> soon)

Do you mean VM template? If so then yes that's again something that should 
ideally be verified before deployment.

The other factor that contributes is infrastructure: full disks, network 
outages, reconfigurations etc. These should be monitored for and acted upon and 
where possible, processes changed to avoid these situations.

> *. There was huge radiation in Finland, but that you would know from the
> news ;-)

:)

Anyway, thank you for looking into the issues here.

Cheers,

Sean
-- 
Dr Sean Harmer | sean.har...@kdab.com | Managing Director UK
KDAB (UK) Ltd, a KDAB Group company
Tel. +44 (0)1625 809908; Sweden (HQ) +46-563-540090
Mobile: +44 (0)7545 140604
KDAB - Qt Experts
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] Qt CI reliability

2016-05-03 Thread Sean Harmer
On Monday 02 May 2016 05:05:07 Tuukka Turunen wrote:
> Hi Sean,
> 
> Firstofall, I do apologize for the inconvenience caused by the CI system. We
> are fully aware of the situation, and the effect is has for productivity.
> All the developers of The Qt Company are using exactly the same CI system.

Yes, that is a lot of people to be held up by an oft-times unreliable piece of 
infrastructure.
 
> To address the problems, we had with Jenkins based CI we started to develop
> new CI system built from the ground up to serve the needs of Qt.
> Unfortunately, it has taken us longer than we anticipated to get the new CI
> system stabile, and there are still a lot of failures caused by the CI
> itself. We are also continuously improving the test asset to make it less
> prone for errors, including identification of flaky cases and fixing and/or
> temporarily blacklisting these. 

The flaky test related failure rate has indeed improved a great deal over the 
last 12 months. Thanks to all who have helped with this. The problems we're 
suffering now seem to be more related to infrastructure issues as Jędrek has 
pointed out.

I think part of the problem is also perception. From outside of TQC, 
contributors have no visibility of the status of an integration beyond the 
gerrit "INTEGRATING" status. This turns it into a black box that can take most 
of a working day to result in a frustating failure.

Would it be possible for you to expose the view of the currently running 
integrations and their status on each node/configuration so that we can see if 
something looks like it might be broken and can approach a sysadmin?
 
> While we unfortunately do not have 24x7 sysadmins,

And this is a problem imho. The CI is a critical system that needs to be 
running 24x7 to support people in different timezones and during out of hours 
work in Europe. If I'm busy with paying project work during office hours, then 
I 
try to do what I can on Qt 3D out of normal hours but several times I've had 
to waste hours at weekends and evenings trying to shepherd changes through. 
This in turn then just adds to the load on the CI system during office hours 
when the system is able to integrate once again.

So in summary, I appreciate the CI is a big complex beast but it's also the 
gateway to getting contributions in and is therefore critical that it runs all 
of the time, or at least as much of the time as possible. Can you investigate 
the feasibility of putting 24x7 support in place please?

> we do have persons
> dedicated to operate the CI system, as well as support from IT as needed
> for the infrastructure. In addition, we are still putting a significant
> effort into developing the CI further and stabilizing it. And, yes, we are
> also continuously monitoring the infrastructure of the CI systems as well
> as planning how to improve it further in the future.

Well, disks filling up over a weekend shows that it isn't monitored 
continuously, or rather, not acted upon. If there isn't 24x7 support then this 
is a valid outcome of course. My argument is that such support is warranted.

Kind regards,

Sean
-- 
Dr Sean Harmer | sean.har...@kdab.com | Managing Director UK
KDAB (UK) Ltd, a KDAB Group company
Tel. +44 (0)1625 809908; Sweden (HQ) +46-563-540090
Mobile: +44 (0)7545 140604
KDAB - Qt Experts
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] Qt CI reliability

2016-05-02 Thread Jędrzej Nowacki
On Saturday 30 of April 2016 20:26:20 Sean Harmer wrote:
> Hi,
Hi,

> after yet another 5 hour wait just to be greeted with yet another random
> failure with no build logs I'm getting really tired of the poor reliability
> of the Qt CI system.
I'm sorry about that.
 
> https://codereview.qt-project.org/#/c/157590/
> 
> has been greeted with genuine failures, failures in qtdeclarative,
> qtxmlpatterns, multiple random failures in qt3d despite being a simple
> change which I suspect are due to issues on one or more CI nodes.

I scanned through the failures and it seems that you had a very bad luck. I 
know CI should not be about luck and therefore I'm really sorry about that. 

You tried to stage the change 7 times
1. Failed to compile qt3d (broken change 
https://codereview.qt-project.org/#/c/157593/)
2. Looks like a network connectivity failure, logs were not flushed as they 
should, so you can blame CI
3. Blame CI, we failed to acquire a free machine for 5h, I will look at that 
later
4. Failed to compile qt3d (broken change 
https://codereview.qt-project.org/#/c/157590/)
5. Failed to compile qt3d (broken change 
https://codereview.qt-project.org/#/c/157590/), same as 4
6. Looks like a network connectivity failure, logs were not flushed as they 
should, so you can blame CI, same as 2
7. Passed

I will ask IT about network, it seems that network interface was re-configured 
during CI run and DHCP assigned a different IP. It should not happen (TM)

Rule of thumb is: if logs show broken compilation it means: real problem, 
don't blame CI. There are three main reasons, that I'm aware of, that can 
cause the problem (sorted according to the probability):
1. One of changes being integrated broke the compilation
2. One of module dependencies broke source compatibility
3. There was a untested template update (this reason will almost disappear 
soon)
*. There was huge radiation in Finland, but that you would know from the news 
;-)

Cheers,
 Jędrek
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development


Re: [Development] Qt CI reliability

2016-05-01 Thread Tuukka Turunen

Hi Sean,

Firstofall, I do apologize for the inconvenience caused by the CI system. We 
are fully aware of the situation, and the effect is has for productivity. All 
the developers of The Qt Company are using exactly the same CI system. 

To address the problems, we had with Jenkins based CI we started to develop new 
CI system built from the ground up to serve the needs of Qt. Unfortunately, it 
has taken us longer than we anticipated to get the new CI system stabile, and 
there are still a lot of failures caused by the CI itself. We are also 
continuously improving the test asset to make it less prone for errors, 
including identification of flaky cases and fixing and/or temporarily 
blacklisting these. 

While we unfortunately do not have 24x7 sysadmins, we do have persons dedicated 
to operate the CI system, as well as support from IT as needed for the 
infrastructure. In addition, we are still putting a significant effort into 
developing the CI further and stabilizing it. And, yes, we are also 
continuously monitoring the infrastructure of the CI systems as well as 
planning how to improve it further in the future.

Yours,

Tuukka

> -Original Message-
> From: Development [mailto:development-
> bounces+tuukka.turunen=qt...@qt-project.org] On Behalf Of Sean Harmer
> Sent: lauantaina 30. huhtikuuta 2016 22.26
> To: development@qt-project.org
> Subject: [Development] Qt CI reliability
> 
> Hi,
> 
> after yet another 5 hour wait just to be greeted with yet another random
> failure with no build logs I'm getting really tired of the poor reliability 
> of the
> Qt CI system.
> 
> https://codereview.qt-project.org/#/c/157590/
> 
> has been greeted with genuine failures, failures in qtdeclarative,
> qtxmlpatterns, multiple random failures in qt3d despite being a simple
> change which I suspect are due to issues on one or more CI nodes.
> 
> I appreciate the new CI is more efficient than the previous implementation
> but it still has a very high rate of failed integration attempts. For a piece 
> of
> infrastructure that is critical to the Qt Project the CI system should have
> 24x7 sysadmin support. I'm not expecting devs to be doing this. The Qt
> Company has sys admins supported by license fees yet we see example after
> example of the support being substandard. Certificates expiring, disks filling
> up, upgrades being performed to coincide with feature freezes, upgrades
> performed then going home for the weekend leaving systems in an unfit
> state.
> 
> Such long waits resulting in random failures is killing both productivity and
> motivation. This whole week seems to have been nothing other than an
> endless cycle of staging and re-staging. We're doing our best to get Qt 3D 
> into
> shape for Qt 5.7 but this is being hamstrung by the infrastructure.
> 
> I don't understand how sysadmins can be caught out by disks filling up
> without them being notified. Why is there no server monitoring in place?
> Likewise for the numerous SSL certificate expirations over the last couple of
> years.
> 
> Now being in the same situation of seemingly endless failures over a holiday
> weekend, it seems my backlog will continue to get longer whilst my will
> power to dedicate my personal time to meeting the release deadline so that
> TQC sysadmins can go on their summer vacations wanes.
> 
> I would be very grateful if anybody happens to read this and is available to
> look at the current CI issues. Please can TQC management dedicate
> resources to sorting out the underlying resource shortages for administering
> the CI, putting on place some reliable server monitoring that actually gets
> acted upon and giving the project the system it should have.
> 
> Thank you and enjoy what is left of the holiday weekend.
> 
> Sean
> --
> Dr Sean Harmer | sean.har...@kdab.com | Managing Director UK
> Klarälvdalens Datakonsult AB, a KDAB Group company Tel. UK +44 (0)1625
> 809908, Sweden (HQ) +46-563-540090 KDAB - Qt Experts - Platform-
> independent software solutions
> ___
> Development mailing list
> Development@qt-project.org
> http://lists.qt-project.org/mailman/listinfo/development
___
Development mailing list
Development@qt-project.org
http://lists.qt-project.org/mailman/listinfo/development