Re: [webkit-qt] The bot infrastructure and gardening.

Balazs Kelemen Thu, 10 May 2012 15:07:24 -0700

reactions inlined

On 05/10/2012 05:01 PM, Osztrogonac Csaba wrote:

Hi All,
Alexis Menard írta:
Hi,

By reading the email of Simon about removing Qt4 I have seen there was
plan to move to Amazon EC2.

State of art of gardening Qt :

- Mostly Ossy alone is gardener, which is unacceptable. Apple made a
move towards improving their bots (when you see kling gardening it
tells you they changed something), Google is already pretty good, GTK
also, we need to be better. While at the summit people praised Qt bots
being green all the time I do think it hides a terrible truth : our
skiplist grows grows grows and nobody look after it which conflicts a
bit with trying to release a stable trunk for Qt 5.0. How many mails
we receive from Ossy complaining about quality?
It's not exactly true that I'm the only one gardener. I'm working on
gardening with my buildbot group together. They are part time developers,
because they are students and have courses, exams, etc. Most of them
aren't WebKit committers yet, so their gardening patches are usually
committed by me or anybody else from here. ( But you can find their
names in the commit logs, changelogs, of course. :) )
With the other thing I have to agree, with this small group we haveresource(enough time) for only fire-fighting: detect who/which commit brokewhich tests,update expected results if it is needed, filing bug reports,commenting bugs,buildfixes, etc. We don't have enough time to fix all bugs instead ofwho caused.
But gardening is so hard if most of the developer don't care with QAat all. WhenI comment a bug with "your patch broke X.Y. layout/API test, diff:...", I regularyget the question: "How can I run this test?" And it isn't good,because it meansthis developer never run tests before. (But everybody should beforecommiting.)Other problem is that many developers insist on their buggy patchbeing intrunk, but they don't care fixing the bug. In this case we can only dothatwe skip the new failing tests, because red bots with many failingtests would
make catching new regression much more complex, sometimes impossible. But
in my opinion rolling-out a buggy patch and reland after fixing it would
cause less pain for everybody than growing, growing and growing skiplist.
I don't know why folks hate rolling out patches. It doesn't mean that the
patch is wrong at all. It isn't a capital sentence for the patch orthe author. :)It only means that the patch caused some trouble/regression an shouldbe fixed.And fixing offline is less painful for others than leaving buggy patchin trunk.Chromium guys usually rollout their own patches if they broke a teston the Qt
bot before I noticed. Really. We should follow their good practice. ;)
- Their is a huge delta machine wise with what the bot is running and
what people use to develop. The bot runs Ubuntu, many of us run
ArchLinux/OpenSuse while some us run Ubuntu. It leads to results
different from what the bot produce and what you see and your machine.
We have encountered many many many times people saying : "it passes on
my machine but not on the bot" -> Added to the Skiplist because nobody
can really see what's going with the bot. Szeged tried their best to
provide a virtual machine but it was a bit of a failure as the VM
doesn't behave the same as the bot, and the VM behave differently
whether your run it on VMWare or VirtualBox.
Unfortunately the VMWare image wasn't the best solution. And then we
created a meta package for Ubuntu 11.10 which installs all dependency:
https://launchpad.net/~u-szeged/+archive/sedkit
With this meta package you can install a full QtWebKit developmentenvironment in an hour.
Now the dircetion is moving to an Amazon Ubuntu image. But I think itis stillpapering over the problem. It is _very good_ (but expensive) forensuring everybodycan simple reproduce the bot results. But we don't develop for onlyone platform.More platform show more hidden and maybe serious bugs. If your patchworks fineon the only one reference platform, it doesn't mean there isn't anybug in it.
The biggest problem is that folks who don't use Ubuntu 11.10 gotthousands of failingtests because of minor font differences. In this case the bestsolution isn't that"I can't reproduce the results, so I won't run layout tests anymore."It would bemore valuable for the whole project if font(config) experts try tomake the WebKit,Qt, fontconfig or anything else to use same fonts. I don't know if itis possibleor not, I don't know anything about fonts. Is it possible somehow tobundle a chosenfontconfig to Qt or to WebKit and use it for regression testing on alldistro instead
of sweating because of different system fontconfig versions?

You are speaking about Linux, but it's not the only system where we wantcoverage.For example on Mac fontconfig does not play a role in the font game. Wecould useit, but than we would lose the coverage for the real use case. Btw,there is some light

in the dark land of fonts:

- I have done some work to unify test results between Linux andMac, hopefully

I could finish it in the near future.

- In Ubutu 12.0, a strange bug have been fixed in freetype whichmade the Ahem

font produce wrong metrics (WidthXHeight=NxN+1 instead of NxN). Ahem is used

in a lot of tests in the css* directories. Currently our expectationsare wrong, but if

we fix them these metrics will match across distros (everybody use the newer
freetype for a long time except our beloved, stable Debian :D )

- We don't have any gardening plan.
Not only the missing gardening plan is the problem. In my
opinion introducing contributing rules would be more important.
For example:
 - Developers should build the patch and run tests before committing.
(Or at least watch the bots after landing and fix/rollout quick ifsomething goes wrong)
 - What should I do if I broke the build / a layout test / API test ?
- What should a gardener do if somebody doesn't care with theregression he/she caused ?- What should do the boss if somebody usually and intentionally hurtthe rules? :)

I have to protest a bit. As Ossy describes it, it's really simple andstraightforward. When somebodybreaks a test than it means his patch is buggy and he should find theerror in his changes, andeverything will be fine. In reality, this is not always the case. Whenyou break a test, it could mean different things:


    1. you did it wrong
Obviously you need to fix your patch

2. there is a bug in the system that you triggered somehow (witheven a totally right change on it's own)Of course the right thing to do is to investigate in the problem. But itcould be very complex, maybe the bug existsin a different subsystem that you don't know well. I don't think it isalways possible to find the manpower to fight

with these bugs.

3. there is a bug/imperfection in the test infrastructure that youtriggeredWell, this is pretty annoying and relatively common. We should detectand solve these issues but it's not really fairto stop a good patch to land until somebody fixes the tools. Note thatworking out of trunk upon your previous work is

possible but it's not fun because you have to struggle more with rebasing.
    4. you caused some change that is not really a bug

Like some pixel differences that the actual users could not even notice.I would say if you do such a change thanlet's update the expectations, but it's not always possible since youcannot test your patch in each environmentwhere we want coverage. (And if you don't use Ubuntu or Debian youcannot even produce results locally for Linux-destop.)

After all, I think we should be careful about what rules we introduce.They should satisfy two requirement:- we have to keep them. not just the first week, not just the firstmonth, but always. :)- they must not block the development too much. How cares if we arerock stable if we cannot follow the evolution of the web?!

I agree with Ossy in that we should allocate more efforts on bug fixing/ stabilisation but I don't agree that we should banish theskip list once and forever. Actually there is no stable port of WebKitwhere the skip list is unused. I would say, let's try to find a better

balance between stability and the speed of development.

What could be improved :

- We need to make a gardening plan. We can't be serious about making
web browsers/APIs without improving our coverage. I know we don't have
much resources but I think it should be ok to have one person doing it
for a week and then turn. Really it's a week maybe boring but it's
once every long time (almost one time every two-three months). This
will make Ossy more free to do something else so Ossy can go back
proper coding. I can make that list if people agree. Also it needs to
be enforced (maybe reviews could be the exception).
Gardening isn't so simple that only one person can be done. It can beenoughfor fire-fighting: buildfixes, updating expected files, reportingbugs, fix
some trivial bug. But isn't enough to fix all regression caused by others
who aren't responsible at all or the regression occured on the part ofWebKit
you don't know anything. Not to mention there are many complex tests, and
there isn't trivial to decide if the new result is correct or not.

I added our gardening timetable to this wiki:
https://trac.webkit.org/wiki/QtWebKitBuildBots
All new volunteers are very welcome. ;-) It would be great if you guysin INdTcould be join, you are near to PDT timezone. And handling problemsfreshly isalways simpler than waiting for hungarian morning and trying solvedozens of
new regressions, broken builds, assertions, flakey tests, ...
- We need to be able to test/stress/break the bot environment. Today
the fact that none of us can mess up with the bot make it hard to
reproduce the failures of the bot that you can't see on your machine.
While I do understand (and we don't want that) that Ossy doesn't give
us the key to the bot, we still need to have one to mess around.
We hacked too many times in the past to make layout test system beable runmore than one bot on the same 8-24 cores machine. But the limitationis stillfor one linux user. We still have a strict limitation: An other usertrying torun tests on the same machine can kill all the bots, so now only oneuser is
allowed. In this case it isn't a good idea if anybody logs in and hacking
something. When I have to do it, I'm very very careful, but sometimes I
broke everything accidentally.

Not strictly in connection to your points but another infrastructuralthing:when will we able to run tests in parallel? Is it reliable right now?Could wemake it the default configuration of nrwt - except on bots, until it isreally stable -so folks were not have to know the command line switch by heart (as Iknow it'snot simple because you need to call the real nrwt and not the pearlwrapper and it'sslightly different). It would be much more fun to run the tests beforeuploading / landing

your patch if it were not run for years.

-kbalazs

_______________________________________________
webkit-qt mailing list
[email protected]
http://lists.webkit.org/mailman/listinfo.cgi/webkit-qt

Re: [webkit-qt] The bot infrastructure and gardening.

Reply via email to