reactions inlined
On 05/10/2012 05:01 PM, Osztrogonac Csaba wrote:
Hi All,
Alexis Menard írta:
Hi,
By reading the email of Simon about removing Qt4 I have seen there was
plan to move to Amazon EC2.
State of art of gardening Qt :
- Mostly Ossy alone is gardener, which is unacceptable. Apple made a
move towards improving their bots (when you see kling gardening it
tells you they changed something), Google is already pretty good, GTK
also, we need to be better. While at the summit people praised Qt bots
being green all the time I do think it hides a terrible truth : our
skiplist grows grows grows and nobody look after it which conflicts a
bit with trying to release a stable trunk for Qt 5.0. How many mails
we receive from Ossy complaining about quality?
It's not exactly true that I'm the only one gardener. I'm working on
gardening with my buildbot group together. They are part time developers,
because they are students and have courses, exams, etc. Most of them
aren't WebKit committers yet, so their gardening patches are usually
committed by me or anybody else from here. ( But you can find their
names in the commit logs, changelogs, of course. :) )
With the other thing I have to agree, with this small group we have
resource
(enough time) for only fire-fighting: detect who/which commit broke
which tests,
update expected results if it is needed, filing bug reports,
commenting bugs,
buildfixes, etc. We don't have enough time to fix all bugs instead of
who caused.
But gardening is so hard if most of the developer don't care with QA
at all. When
I comment a bug with "your patch broke X.Y. layout/API test, diff:
...", I regulary
get the question: "How can I run this test?" And it isn't good,
because it means
this developer never run tests before. (But everybody should before
commiting.)
Other problem is that many developers insist on their buggy patch
being in
trunk, but they don't care fixing the bug. In this case we can only do
that
we skip the new failing tests, because red bots with many failing
tests would
make catching new regression much more complex, sometimes impossible. But
in my opinion rolling-out a buggy patch and reland after fixing it would
cause less pain for everybody than growing, growing and growing skiplist.
I don't know why folks hate rolling out patches. It doesn't mean that the
patch is wrong at all. It isn't a capital sentence for the patch or
the author. :)
It only means that the patch caused some trouble/regression an should
be fixed.
And fixing offline is less painful for others than leaving buggy patch
in trunk.
Chromium guys usually rollout their own patches if they broke a test
on the Qt
bot before I noticed. Really. We should follow their good practice. ;)
- Their is a huge delta machine wise with what the bot is running and
what people use to develop. The bot runs Ubuntu, many of us run
ArchLinux/OpenSuse while some us run Ubuntu. It leads to results
different from what the bot produce and what you see and your machine.
We have encountered many many many times people saying : "it passes on
my machine but not on the bot" -> Added to the Skiplist because nobody
can really see what's going with the bot. Szeged tried their best to
provide a virtual machine but it was a bit of a failure as the VM
doesn't behave the same as the bot, and the VM behave differently
whether your run it on VMWare or VirtualBox.
Unfortunately the VMWare image wasn't the best solution. And then we
created a meta package for Ubuntu 11.10 which installs all dependency:
https://launchpad.net/~u-szeged/+archive/sedkit
With this meta package you can install a full QtWebKit development
environment in an hour.
Now the dircetion is moving to an Amazon Ubuntu image. But I think it
is still
papering over the problem. It is _very good_ (but expensive) for
ensuring everybody
can simple reproduce the bot results. But we don't develop for only
one platform.
More platform show more hidden and maybe serious bugs. If your patch
works fine
on the only one reference platform, it doesn't mean there isn't any
bug in it.
The biggest problem is that folks who don't use Ubuntu 11.10 got
thousands of failing
tests because of minor font differences. In this case the best
solution isn't that
"I can't reproduce the results, so I won't run layout tests anymore."
It would be
more valuable for the whole project if font(config) experts try to
make the WebKit,
Qt, fontconfig or anything else to use same fonts. I don't know if it
is possible
or not, I don't know anything about fonts. Is it possible somehow to
bundle a chosen
fontconfig to Qt or to WebKit and use it for regression testing on all
distro instead
of sweating because of different system fontconfig versions?
You are speaking about Linux, but it's not the only system where we want
coverage.
For example on Mac fontconfig does not play a role in the font game. We
could use
it, but than we would lose the coverage for the real use case. Btw,
there is some light
in the dark land of fonts:
- I have done some work to unify test results between Linux and
Mac, hopefully
I could finish it in the near future.
- In Ubutu 12.0, a strange bug have been fixed in freetype which
made the Ahem
font produce wrong metrics (WidthXHeight=NxN+1 instead of NxN). Ahem is used
in a lot of tests in the css* directories. Currently our expectations
are wrong, but if
we fix them these metrics will match across distros (everybody use the newer
freetype for a long time except our beloved, stable Debian :D )
- We don't have any gardening plan.
Not only the missing gardening plan is the problem. In my
opinion introducing contributing rules would be more important.
For example:
- Developers should build the patch and run tests before committing.
(Or at least watch the bots after landing and fix/rollout quick if
something goes wrong)
- What should I do if I broke the build / a layout test / API test ?
- What should a gardener do if somebody doesn't care with the
regression he/she caused ?
- What should do the boss if somebody usually and intentionally hurt
the rules? :)
I have to protest a bit. As Ossy describes it, it's really simple and
straightforward. When somebody
breaks a test than it means his patch is buggy and he should find the
error in his changes, and
everything will be fine. In reality, this is not always the case. When
you break a test, it could mean different things:
1. you did it wrong
Obviously you need to fix your patch
2. there is a bug in the system that you triggered somehow (with
even a totally right change on it's own)
Of course the right thing to do is to investigate in the problem. But it
could be very complex, maybe the bug exists
in a different subsystem that you don't know well. I don't think it is
always possible to find the manpower to fight
with these bugs.
3. there is a bug/imperfection in the test infrastructure that you
triggered
Well, this is pretty annoying and relatively common. We should detect
and solve these issues but it's not really fair
to stop a good patch to land until somebody fixes the tools. Note that
working out of trunk upon your previous work is
possible but it's not fun because you have to struggle more with rebasing.
4. you caused some change that is not really a bug
Like some pixel differences that the actual users could not even notice.
I would say if you do such a change than
let's update the expectations, but it's not always possible since you
cannot test your patch in each environment
where we want coverage. (And if you don't use Ubuntu or Debian you
cannot even produce results locally for Linux-destop.)
After all, I think we should be careful about what rules we introduce.
They should satisfy two requirement:
- we have to keep them. not just the first week, not just the first
month, but always. :)
- they must not block the development too much. How cares if we are
rock stable if we cannot follow the evolution of the web?!
I agree with Ossy in that we should allocate more efforts on bug fixing
/ stabilisation but I don't agree that we should banish the
skip list once and forever. Actually there is no stable port of WebKit
where the skip list is unused. I would say, let's try to find a better
balance between stability and the speed of development.
What could be improved :
- We need to make a gardening plan. We can't be serious about making
web browsers/APIs without improving our coverage. I know we don't have
much resources but I think it should be ok to have one person doing it
for a week and then turn. Really it's a week maybe boring but it's
once every long time (almost one time every two-three months). This
will make Ossy more free to do something else so Ossy can go back
proper coding. I can make that list if people agree. Also it needs to
be enforced (maybe reviews could be the exception).
Gardening isn't so simple that only one person can be done. It can be
enough
for fire-fighting: buildfixes, updating expected files, reporting
bugs, fix
some trivial bug. But isn't enough to fix all regression caused by others
who aren't responsible at all or the regression occured on the part of
WebKit
you don't know anything. Not to mention there are many complex tests, and
there isn't trivial to decide if the new result is correct or not.
I added our gardening timetable to this wiki:
https://trac.webkit.org/wiki/QtWebKitBuildBots
All new volunteers are very welcome. ;-) It would be great if you guys
in INdT
could be join, you are near to PDT timezone. And handling problems
freshly is
always simpler than waiting for hungarian morning and trying solve
dozens of
new regressions, broken builds, assertions, flakey tests, ...
- We need to be able to test/stress/break the bot environment. Today
the fact that none of us can mess up with the bot make it hard to
reproduce the failures of the bot that you can't see on your machine.
While I do understand (and we don't want that) that Ossy doesn't give
us the key to the bot, we still need to have one to mess around.
We hacked too many times in the past to make layout test system be
able run
more than one bot on the same 8-24 cores machine. But the limitation
is still
for one linux user. We still have a strict limitation: An other user
trying to
run tests on the same machine can kill all the bots, so now only one
user is
allowed. In this case it isn't a good idea if anybody logs in and hacking
something. When I have to do it, I'm very very careful, but sometimes I
broke everything accidentally.
Not strictly in connection to your points but another infrastructural
thing:
when will we able to run tests in parallel? Is it reliable right now?
Could we
make it the default configuration of nrwt - except on bots, until it is
really stable -
so folks were not have to know the command line switch by heart (as I
know it's
not simple because you need to call the real nrwt and not the pearl
wrapper and it's
slightly different). It would be much more fun to run the tests before
uploading / landing
your patch if it were not run for years.
-kbalazs
_______________________________________________
webkit-qt mailing list
[email protected]
http://lists.webkit.org/mailman/listinfo.cgi/webkit-qt