[webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Adam Barth
Not to point fingers, but we've been having trouble keeping
build.webkit.org green these past few weeks.  As I write this message,
every platform is broken, again.  As the project scales, polluting the
build with brokenness impacts more developers and drains more
productivity.

Here are some approaches we could use to turn this tragedy of the
commons around:

1) Adopt a rollout first, ask questions later ethic.  The vast
majority of changes are not important enough to break the build for
everyone else.  If we adopt a rollout first, ask questions later
ethic, committers would feel free to rollout brokenness to unbreak the
build and contributors shouldn't be offended if their patch is rolled
out without their knowledge.  We can always re-land the broken patch
later once it actually works.

2) Require pre-commit vetting of patches.  We have the resources to
build and test every patch on at least one platform before landing the
patch in the main tree.  Vetting patches before landing will help us
avoid breaking every platform at once.  Once the patch has been
vetted, it can either be landed mechanically (i.e., by commit-queue)
or manually.

Here's how I would design the life and times of a patch:

1) Contributor uploads patch and nominates it for review.
2) Patch vetted by the EWS on numerous platforms.
3) If the EWS finds a problem, return to step 1.
4) Reviewer marks patch review+.
5) Committer decides the patch is ready to land.
6) Patch built and tested against top-of-tree on at least one platform.
7) If the patch fail to build or pass tests, return to step 1.
8) Patch landed.
9) If the patch turns any of the core builders red, patch is rolled
out, return to step 1.

I suspect most of our brokenness coming from committers skipping steps 6 and 7.

Adam
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Jeremy Orlow
On Fri, Feb 26, 2010 at 10:36 AM, Adam Barth aba...@webkit.org wrote:

 Not to point fingers, but we've been having trouble keeping
 build.webkit.org green these past few weeks.  As I write this message,
 every platform is broken, again.  As the project scales, polluting the
 build with brokenness impacts more developers and drains more
 productivity.

 Here are some approaches we could use to turn this tragedy of the
 commons around:

 1) Adopt a rollout first, ask questions later ethic.  The vast
 majority of changes are not important enough to break the build for
 everyone else.  If we adopt a rollout first, ask questions later
 ethic, committers would feel free to rollout brokenness to unbreak the
 build and contributors shouldn't be offended if their patch is rolled
 out without their knowledge.  We can always re-land the broken patch
 later once it actually works.

 2) Require pre-commit vetting of patches.  We have the resources to
 build and test every patch on at least one platform before landing the
 patch in the main tree.  Vetting patches before landing will help us
 avoid breaking every platform at once.  Once the patch has been
 vetted, it can either be landed mechanically (i.e., by commit-queue)
 or manually.

 Here's how I would design the life and times of a patch:

 1) Contributor uploads patch and nominates it for review.
 2) Patch vetted by the EWS on numerous platforms.
 3) If the EWS finds a problem, return to step 1.
 4) Reviewer marks patch review+.
 5) Committer decides the patch is ready to land.
 6) Patch built and tested against top-of-tree on at least one platform.
 7) If the patch fail to build or pass tests, return to step 1.
 8) Patch landed.
 9) If the patch turns any of the core builders red, patch is rolled
 out, return to step 1.

 I suspect most of our brokenness coming from committers skipping steps 6
 and 7.


LGTM.  The only thing I'd add is that we REALLY need emails to start going
out to webkit-dev (and ideally the suspected patch owners as well) when
things do break.  What is doing this blocked on?
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Xan Lopez
On Fri, Feb 26, 2010 at 11:36 AM, Adam Barth aba...@webkit.org wrote:
 Not to point fingers, but we've been having trouble keeping
 build.webkit.org green these past few weeks.  As I write this message,
 every platform is broken, again.  As the project scales, polluting the
 build with brokenness impacts more developers and drains more
 productivity.

 Here are some approaches we could use to turn this tragedy of the
 commons around:

 1) Adopt a rollout first, ask questions later ethic.  The vast
 majority of changes are not important enough to break the build for
 everyone else.  If we adopt a rollout first, ask questions later
 ethic, committers would feel free to rollout brokenness to unbreak the
 build and contributors shouldn't be offended if their patch is rolled
 out without their knowledge.  We can always re-land the broken patch
 later once it actually works.

In my experience this is more or less the current policy, especially
for build breakage (as opposed to test breakage). Maybe a bit less
hardliner in that we usually try contact the culprit and give some
time to fix issues, but I think there's no remorse in rolling out
patches if there's brokenness and nobody working on fixing it.


 2) Require pre-commit vetting of patches.  We have the resources to
 build and test every patch on at least one platform before landing the
 patch in the main tree.  Vetting patches before landing will help us
 avoid breaking every platform at once.  Once the patch has been
 vetted, it can either be landed mechanically (i.e., by commit-queue)
 or manually.

 Here's how I would design the life and times of a patch:

 1) Contributor uploads patch and nominates it for review.
 2) Patch vetted by the EWS on numerous platforms.
 3) If the EWS finds a problem, return to step 1.
 4) Reviewer marks patch review+.
 5) Committer decides the patch is ready to land.
 6) Patch built and tested against top-of-tree on at least one platform.
 7) If the patch fail to build or pass tests, return to step 1.
 8) Patch landed.
 9) If the patch turns any of the core builders red, patch is rolled
 out, return to step 1.

EWS has been a huge boon in productivity at least for us GTK+ folks,
so I fully support any step to increase its awesomeness! Of course
what we need to do is to work on increasing the number of core
builders, but that's an orthogonal issue and our own responsibility.

Cheers,

Xan


 I suspect most of our brokenness coming from committers skipping steps 6 and 
 7.

 Adam
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Kenneth Christiansen
 1) Contributor uploads patch and nominates it for review.
 2) Patch vetted by the EWS on numerous platforms.

When a non-committer uploads a patch, it is not being vet by EWS. I
know that is due to security issues. It would be really nice with an
option for a reviewer to accept it to run on the EWS.

Kenneht
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Eric Seidel
On Fri, Feb 26, 2010 at 4:14 AM, Kenneth Christiansen
kenneth.christian...@openbossa.org wrote:
 1) Contributor uploads patch and nominates it for review.
 2) Patch vetted by the EWS on numerous platforms.

 When a non-committer uploads a patch, it is not being vet by EWS. I
 know that is due to security issues. It would be really nice with an
 option for a reviewer to accept it to run on the EWS.

The only EWS which requires committer access is Mac-EWS.  All other
EWS bots will run any patch.

-eric
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Alex Milowski
On Fri, Feb 26, 2010 at 7:09 AM, Eric Seidel e...@webkit.org wrote:
 On Fri, Feb 26, 2010 at 4:14 AM, Kenneth Christiansen
 kenneth.christian...@openbossa.org wrote:
 1) Contributor uploads patch and nominates it for review.
 2) Patch vetted by the EWS on numerous platforms.

 When a non-committer uploads a patch, it is not being vet by EWS. I
 know that is due to security issues. It would be really nice with an
 option for a reviewer to accept it to run on the EWS.

 The only EWS which requires committer access is Mac-EWS.  All other
 EWS bots will run any patch.

Why is that?   That's the platform I'm most interested in see run.

-- 
--Alex Milowski
The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered.

Bertrand Russell in a footnote of Principles of Mathematics
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Eric Seidel
On Fri, Feb 26, 2010 at 7:12 AM, Alex Milowski a...@milowski.com wrote:
 The only EWS which requires committer access is Mac-EWS.  All other
 EWS bots will run any patch.

 Why is that?   That's the platform I'm most interested in see run.

Various reasons.  Mostly due to our current hardware setup.  If
someone has some mac hardware they'd like to donate to the cause it
would be most welcome.

-eric
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Alex Milowski
On Fri, Feb 26, 2010 at 7:17 AM, Eric Seidel e...@webkit.org wrote:
 On Fri, Feb 26, 2010 at 7:12 AM, Alex Milowski a...@milowski.com wrote:
 The only EWS which requires committer access is Mac-EWS.  All other
 EWS bots will run any patch.

 Why is that?   That's the platform I'm most interested in see run.

 Various reasons.  Mostly due to our current hardware setup.  If
 someone has some mac hardware they'd like to donate to the cause it
 would be most welcome.

That seems really, really solvable.

-- 
--Alex Milowski
The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered.

Bertrand Russell in a footnote of Principles of Mathematics
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Adam Barth
On Fri, Feb 26, 2010 at 7:24 AM, Alex Milowski a...@milowski.com wrote:
 On Fri, Feb 26, 2010 at 7:17 AM, Eric Seidel e...@webkit.org wrote:
 On Fri, Feb 26, 2010 at 7:12 AM, Alex Milowski a...@milowski.com wrote:
 The only EWS which requires committer access is Mac-EWS.  All other
 EWS bots will run any patch.

 Why is that?   That's the platform I'm most interested in see run.

 Various reasons.  Mostly due to our current hardware setup.  If
 someone has some mac hardware they'd like to donate to the cause it
 would be most welcome.

 That seems really, really solvable.

The core issue here is that the license for Mac OS X prevents us from
running the OS in a virtual machine.  The way we protect ourselves
from random folks haxoring the EWS on Linux is by running them on EC2
and re-imagining the machines periodically.

If you'd like to donate hardware that you're willing to have random
folks run code on, please let me or Eric know and we'll show you how
to get the mac-ews up and running.

Adam
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Adam Barth
On Fri, Feb 26, 2010 at 8:47 AM, Alex Milowski a...@milowski.com wrote:
 On Fri, Feb 26, 2010 at 8:19 AM, Adam Barth aba...@webkit.org wrote:
 On Fri, Feb 26, 2010 at 7:24 AM, Alex Milowski a...@milowski.com wrote:
 On Fri, Feb 26, 2010 at 7:17 AM, Eric Seidel e...@webkit.org wrote:
 On Fri, Feb 26, 2010 at 7:12 AM, Alex Milowski a...@milowski.com wrote:
 The only EWS which requires committer access is Mac-EWS.  All other
 EWS bots will run any patch.

 Why is that?   That's the platform I'm most interested in see run.

 Various reasons.  Mostly due to our current hardware setup.  If
 someone has some mac hardware they'd like to donate to the cause it
 would be most welcome.

 That seems really, really solvable.

 The core issue here is that the license for Mac OS X prevents us from
 running the OS in a virtual machine.  The way we protect ourselves
 from random folks haxoring the EWS on Linux is by running them on EC2
 and re-imagining the machines periodically.

 So, it is possible to run Mac OS X on a virtual machine:

Oh, awesome!

 The real issue is you can't run this in the cloud like on an EC2 server
 because of the hardware restriction in Apple's license, right?

EC2 has support for Linux and Windows, but not Mac.  I have been
meaning to set up a Windows box, but I haven't gotten around to it
yet.  If you know of a cloud provider that has Mac, we can set up the
mac-ews there.

 If you'd like to donate hardware that you're willing to have random
 folks run code on, please let me or Eric know and we'll show you how
 to get the mac-ews up and running.


 I have limited bandwidth where I'm at and so hosting something, while
 possible, needs careful consideration.  I've contemplated running something
 like EWS for my own work so I'd be interested in learning how this work.

Amazon tells me that our current bots use about 4 GB/month of download
bandwidth and 600 MB/month of upload bandwidth.  I presume almost all
of the bandwidth is to update the working copies of the four bots
hosted there.

 ...but will just one server out there somewhere solve this problem?  Don't
 we need several?

It depends on how beefy your server it, but one server is probably
fine.  The current mac-ews is running on one machine and has no
trouble keeping up with the load.

Adam
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Adam Barth
On Fri, Feb 26, 2010 at 8:55 AM, Adam Barth aba...@webkit.org wrote:
 Amazon tells me that our current bots use about 4 GB/month of download
 bandwidth and 600 MB/month of upload bandwidth.  I presume almost all
 of the bandwidth is to update the working copies of the four bots
 hosted there.

In case you're curious, Amazon charges us 9 cents/month for that much bandwidth.

Adam
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Kenneth Christiansen
That is some of the best 9 cents spend ever!

Kenneth

On Fri, Feb 26, 2010 at 1:58 PM, Adam Barth aba...@webkit.org wrote:
 On Fri, Feb 26, 2010 at 8:55 AM, Adam Barth aba...@webkit.org wrote:
 Amazon tells me that our current bots use about 4 GB/month of download
 bandwidth and 600 MB/month of upload bandwidth.  I presume almost all
 of the bandwidth is to update the working copies of the four bots
 hosted there.

 In case you're curious, Amazon charges us 9 cents/month for that much 
 bandwidth.

 Adam
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev




-- 
Kenneth Rohde Christiansen
Technical Lead / Senior Software Engineer
Qt Labs Americas, Nokia Technology Institute, INdT
Phone  +55 81 8895 6002 / E-mail kenneth.christiansen at openbossa.org
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Adam Barth
Well, the total bill is a bit bigger, but yeah.  :)

Adam


On Fri, Feb 26, 2010 at 9:05 AM, Kenneth Christiansen
kenneth.christian...@openbossa.org wrote:
 That is some of the best 9 cents spend ever!

 Kenneth

 On Fri, Feb 26, 2010 at 1:58 PM, Adam Barth aba...@webkit.org wrote:
 On Fri, Feb 26, 2010 at 8:55 AM, Adam Barth aba...@webkit.org wrote:
 Amazon tells me that our current bots use about 4 GB/month of download
 bandwidth and 600 MB/month of upload bandwidth.  I presume almost all
 of the bandwidth is to update the working copies of the four bots
 hosted there.

 In case you're curious, Amazon charges us 9 cents/month for that much 
 bandwidth.

 Adam
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Dimitri Glazkov
To summarize the thread:

1) We're adopting when in doubt, roll it out approach to patches
that turn tree red.
2) Need to find a way to run Mac-EWS for non-committers.
3) Enable build-break emails to webkit-dev or another opt-in mailing list

What else?

:DG

On Fri, Feb 26, 2010 at 9:08 AM, Adam Barth aba...@webkit.org wrote:
 Well, the total bill is a bit bigger, but yeah.  :)

 Adam


 On Fri, Feb 26, 2010 at 9:05 AM, Kenneth Christiansen
 kenneth.christian...@openbossa.org wrote:
 That is some of the best 9 cents spend ever!

 Kenneth

 On Fri, Feb 26, 2010 at 1:58 PM, Adam Barth aba...@webkit.org wrote:
 On Fri, Feb 26, 2010 at 8:55 AM, Adam Barth aba...@webkit.org wrote:
 Amazon tells me that our current bots use about 4 GB/month of download
 bandwidth and 600 MB/month of upload bandwidth.  I presume almost all
 of the bandwidth is to update the working copies of the four bots
 hosted there.

 In case you're curious, Amazon charges us 9 cents/month for that much 
 bandwidth.

 Adam
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Maciej Stachowiak


On Feb 26, 2010, at 9:17 AM, Dimitri Glazkov wrote:


To summarize the thread:

1) We're adopting when in doubt, roll it out approach to patches
that turn tree red.


I think it's polite, though not mandatory, to make a reasonable effort  
to find the person responsible for the breakage and give them a chance  
to fix it. (This doesn't have to mean hunting around for hours or  
days, but you could send email or ask on IRC.) Also acceptable to fix  
it yourself, if it is obvious how.



2) Need to find a way to run Mac-EWS for non-committers.
3) Enable build-break emails to webkit-dev or another opt-in  
mailing list


What else?


I'd like it if we had an IRC bot that announced build breakage on  
#webkit.


Regards,
Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Nikolas Zimmermann


Am 26.02.2010 um 18:17 schrieb Dimitri Glazkov:


To summarize the thread:

1) We're adopting when in doubt, roll it out approach to patches
that turn tree red.
2) Need to find a way to run Mac-EWS for non-committers.
3) Enable build-break emails to webkit-dev or another opt-in  
mailing list


What else?


I'm a bit scared of rule 1. How about we define a minimum delay when  
to roll-out patches, after they break something?
Let's say, if a commit breaks the tree, give the commiter a time frame  
of 30 minutes to fix it - otherwhise roll-out (we could even automate  
that.)


Example: When landing a SVG patch, that worked fine on Leopard, but  
broke Snow Leopard, I'd like to have some time to identify wheter it's  
the
fault of my patch, or a platform specific problem. If it's the fault  
of my patch, I have no problem with reverting. But if I can't  
immediately fix the
problem, because it's a platform specific issue, which can not be  
fixed (in terms of WebKit), then adding to the Skipped list, and  
filing a new bug
just takes 5 minutes. Reverting the whole patch, just to reland it  
with a Skipped list addition is a bit too much work for me.


What do others think?

Cheers,
Niko

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Maciej Stachowiak


On Feb 26, 2010, at 1:36 AM, Adam Barth wrote:


Not to point fingers, but we've been having trouble keeping
build.webkit.org green these past few weeks.  As I write this message,
every platform is broken, again.  As the project scales, polluting the
build with brokenness impacts more developers and drains more
productivity.

Here are some approaches we could use to turn this tragedy of the
commons around:

1) Adopt a rollout first, ask questions later ethic.  The vast
majority of changes are not important enough to break the build for
everyone else.  If we adopt a rollout first, ask questions later
ethic, committers would feel free to rollout brokenness to unbreak the
build and contributors shouldn't be offended if their patch is rolled
out without their knowledge.  We can always re-land the broken patch
later once it actually works.

2) Require pre-commit vetting of patches.  We have the resources to
build and test every patch on at least one platform before landing the
patch in the main tree.  Vetting patches before landing will help us
avoid breaking every platform at once.  Once the patch has been
vetted, it can either be landed mechanically (i.e., by commit-queue)
or manually.

Here's how I would design the life and times of a patch:

1) Contributor uploads patch and nominates it for review.
2) Patch vetted by the EWS on numerous platforms.
3) If the EWS finds a problem, return to step 1.
4) Reviewer marks patch review+.
5) Committer decides the patch is ready to land.
6) Patch built and tested against top-of-tree on at least one  
platform.

7) If the patch fail to build or pass tests, return to step 1.
8) Patch landed.
9) If the patch turns any of the core builders red, patch is rolled
out, return to step 1.

I suspect most of our brokenness coming from committers skipping  
steps 6 and 7.


One data point: I broke the build this weekend, because I introduced a  
problem that affected debug builds but not release. I did a full  
release build on my own system before committing. When someone pointed  
out the breakage, I rolled the patch out myself until I could fix it.  
If the problems were such that I could fix them as quickly as rolling  
out, I would


I feel like the biggest failure in my case was that I forgot to look  
at the bot once my patch went through a cycle. This is why I wish it  
would do some form of more active notification. Sometimes I get  
distracted after committing and forget to keep hitting reload on the  
buildbot page.


Regards,
Maciej




___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Chris Jerdonek
On Fri, Feb 26, 2010 at 1:36 AM, Adam Barth aba...@webkit.org wrote:

 2) Require pre-commit vetting of patches.  We have the resources to

 Here's how I would design the life and times of a patch:

 1) Contributor uploads patch and nominates it for review.
 2) Patch vetted by the EWS on numerous platforms.
 3) If the EWS finds a problem, return to step 1.
 4) Reviewer marks patch review+.

It seems like this would preclude serial patches from getting reviewed together.

If I break a larger patch into smaller pieces for the benefit of the
reviewer (so that the second piece depends on the first getting
committed, etc), it seems like this process would mean that the second
piece can't get reviewed until the first piece is committed.

It seems like the committer should be allowed to decide when (2) and
(3) happen relative to the other steps -- provided it happens some
time before landing.

--Chris

 5) Committer decides the patch is ready to land.
 6) Patch built and tested against top-of-tree on at least one platform.
 7) If the patch fail to build or pass tests, return to step 1.
 8) Patch landed.
 9) If the patch turns any of the core builders red, patch is rolled
 out, return to step 1.
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Eric Seidel
The bots take 15 minutes to cycle.  The moment the build is broken,
thats FIX_TIME + BOT_CYCLE_TIME until their green again.

I think we should cap the fix grace period at something like 15
minutes, that means no more than 30 minutes of tree redness per break.
 That might be too aggressive to start with for WebKit, but I think we
should move towards that.

I would re-write rule one as something like this:
1.  Comment in the bugzilla bug when the build breaks.  If there is no
bugzilla bug, comment in #webkit.
2.  15 minutes after the break or 10 minutes after the comment, with
no reply from the breaker, roll out the patch.

-eric

On Fri, Feb 26, 2010 at 9:32 AM, Nikolas Zimmermann
zimmerm...@physik.rwth-aachen.de wrote:

 Am 26.02.2010 um 18:17 schrieb Dimitri Glazkov:

 To summarize the thread:

 1) We're adopting when in doubt, roll it out approach to patches
 that turn tree red.
 2) Need to find a way to run Mac-EWS for non-committers.
 3) Enable build-break emails to webkit-dev or another opt-in mailing
 list

 What else?

 I'm a bit scared of rule 1. How about we define a minimum delay when to
 roll-out patches, after they break something?
 Let's say, if a commit breaks the tree, give the commiter a time frame of 30
 minutes to fix it - otherwhise roll-out (we could even automate that.)

 Example: When landing a SVG patch, that worked fine on Leopard, but broke
 Snow Leopard, I'd like to have some time to identify wheter it's the
 fault of my patch, or a platform specific problem. If it's the fault of my
 patch, I have no problem with reverting. But if I can't immediately fix the
 problem, because it's a platform specific issue, which can not be fixed (in
 terms of WebKit), then adding to the Skipped list, and filing a new bug
 just takes 5 minutes. Reverting the whole patch, just to reland it with a
 Skipped list addition is a bit too much work for me.

 What do others think?

 Cheers,
 Niko

 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Alex Milowski
On Fri, Feb 26, 2010 at 8:55 AM, Adam Barth aba...@webkit.org wrote:
 On Fri, Feb 26, 2010 at 8:47 AM, Alex Milowski a...@milowski.com wrote:
 On Fri, Feb 26, 2010 at 8:19 AM, Adam Barth aba...@webkit.org wrote:
 On Fri, Feb 26, 2010 at 7:24 AM, Alex Milowski a...@milowski.com wrote:
 On Fri, Feb 26, 2010 at 7:17 AM, Eric Seidel e...@webkit.org wrote:
 On Fri, Feb 26, 2010 at 7:12 AM, Alex Milowski a...@milowski.com wrote:
 The only EWS which requires committer access is Mac-EWS.  All other
 EWS bots will run any patch.

 Why is that?   That's the platform I'm most interested in see run.

 Various reasons.  Mostly due to our current hardware setup.  If
 someone has some mac hardware they'd like to donate to the cause it
 would be most welcome.

 That seems really, really solvable.

 The core issue here is that the license for Mac OS X prevents us from
 running the OS in a virtual machine.  The way we protect ourselves
 from random folks haxoring the EWS on Linux is by running them on EC2
 and re-imagining the machines periodically.

 So, it is possible to run Mac OS X on a virtual machine:

 Oh, awesome!

 The real issue is you can't run this in the cloud like on an EC2 server
 because of the hardware restriction in Apple's license, right?

 EC2 has support for Linux and Windows, but not Mac.  I have been
 meaning to set up a Windows box, but I haven't gotten around to it
 yet.  If you know of a cloud provider that has Mac, we can set up the
 mac-ews there.

The only non-dedicated server hosting provider I've found is GoDaddy:

   http://www.godaddy.com/hosting/mac-hosting.aspx

I don't know if starting/stopping instances is as easy as Amazon's EC2
service (which I use).  I've never used their virtual hosting service.

-- 
--Alex Milowski
The excellence of grammar as a guide is proportional to the paucity of the
inflexions, i.e. to the degree of analysis effected by the language
considered.

Bertrand Russell in a footnote of Principles of Mathematics
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Dimitri Glazkov
On Fri, Feb 26, 2010 at 9:44 AM, Eric Seidel e...@webkit.org wrote:
 The bots take 15 minutes to cycle.  The moment the build is broken,
 thats FIX_TIME + BOT_CYCLE_TIME until their green again.

 I think we should cap the fix grace period at something like 15
 minutes, that means no more than 30 minutes of tree redness per break.
  That might be too aggressive to start with for WebKit, but I think we
 should move towards that.

 I would re-write rule one as something like this:
 1.  Comment in the bugzilla bug when the build breaks.  If there is no
 bugzilla bug, comment in #webkit.
 2.  15 minutes after the break or 10 minutes after the comment, with
 no reply from the breaker, roll out the patch.

Sounds great. Is this going to be a new page on webkit.org?

:DG

 -eric

 On Fri, Feb 26, 2010 at 9:32 AM, Nikolas Zimmermann
 zimmerm...@physik.rwth-aachen.de wrote:

 Am 26.02.2010 um 18:17 schrieb Dimitri Glazkov:

 To summarize the thread:

 1) We're adopting when in doubt, roll it out approach to patches
 that turn tree red.
 2) Need to find a way to run Mac-EWS for non-committers.
 3) Enable build-break emails to webkit-dev or another opt-in mailing
 list

 What else?

 I'm a bit scared of rule 1. How about we define a minimum delay when to
 roll-out patches, after they break something?
 Let's say, if a commit breaks the tree, give the commiter a time frame of 30
 minutes to fix it - otherwhise roll-out (we could even automate that.)

 Example: When landing a SVG patch, that worked fine on Leopard, but broke
 Snow Leopard, I'd like to have some time to identify wheter it's the
 fault of my patch, or a platform specific problem. If it's the fault of my
 patch, I have no problem with reverting. But if I can't immediately fix the
 problem, because it's a platform specific issue, which can not be fixed (in
 terms of WebKit), then adding to the Skipped list, and filing a new bug
 just takes 5 minutes. Reverting the whole patch, just to reland it with a
 Skipped list addition is a bit too much work for me.

 What do others think?

 Cheers,
 Niko

 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Jeremy Orlow
On Fri, Feb 26, 2010 at 6:47 PM, Dimitri Glazkov dglaz...@chromium.orgwrote:

 On Fri, Feb 26, 2010 at 9:44 AM, Eric Seidel e...@webkit.org wrote:
  The bots take 15 minutes to cycle.  The moment the build is broken,
  thats FIX_TIME + BOT_CYCLE_TIME until their green again.
 
  I think we should cap the fix grace period at something like 15
  minutes, that means no more than 30 minutes of tree redness per break.
   That might be too aggressive to start with for WebKit, but I think we
  should move towards that.
 
  I would re-write rule one as something like this:
  1.  Comment in the bugzilla bug when the build breaks.  If there is no
  bugzilla bug, comment in #webkit.
  2.  15 minutes after the break or 10 minutes after the comment, with
  no reply from the breaker, roll out the patch.

 Sounds great. Is this going to be a new page on webkit.org?


Agree it sounds like a good plan.

Re the emails: who knows how to do that?  Can someone own this process to
completion and do it as soon as possible?  It'd be much appreciated!



 :DG

  -eric
 
  On Fri, Feb 26, 2010 at 9:32 AM, Nikolas Zimmermann
  zimmerm...@physik.rwth-aachen.de wrote:
 
  Am 26.02.2010 um 18:17 schrieb Dimitri Glazkov:
 
  To summarize the thread:
 
  1) We're adopting when in doubt, roll it out approach to patches
  that turn tree red.
  2) Need to find a way to run Mac-EWS for non-committers.
  3) Enable build-break emails to webkit-dev or another opt-in mailing
  list
 
  What else?
 
  I'm a bit scared of rule 1. How about we define a minimum delay when to
  roll-out patches, after they break something?
  Let's say, if a commit breaks the tree, give the commiter a time frame
 of 30
  minutes to fix it - otherwhise roll-out (we could even automate that.)
 
  Example: When landing a SVG patch, that worked fine on Leopard, but
 broke
  Snow Leopard, I'd like to have some time to identify wheter it's the
  fault of my patch, or a platform specific problem. If it's the fault of
 my
  patch, I have no problem with reverting. But if I can't immediately fix
 the
  problem, because it's a platform specific issue, which can not be fixed
 (in
  terms of WebKit), then adding to the Skipped list, and filing a new bug
  just takes 5 minutes. Reverting the whole patch, just to reland it with
 a
  Skipped list addition is a bit too much work for me.
 
  What do others think?
 
  Cheers,
  Niko
 
  ___
  webkit-dev mailing list
  webkit-dev@lists.webkit.org
  http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
 
 
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Alexey Proskuryakov


On 26.02.2010, at 9:29, Maciej Stachowiak wrote:

I'd like it if we had an IRC bot that announced build breakage on  
#webkit.



Perhaps better yet, on #webkit-build, as buildbot used to do.

- WBR, Alexey Proskuryakov

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Maciej Stachowiak


On Feb 26, 2010, at 9:58 AM, Alexey Proskuryakov wrote:



On 26.02.2010, at 9:50, Jeremy Orlow wrote:


 I would re-write rule one as something like this:
 1.  Comment in the bugzilla bug when the build breaks.  If there  
is no

 bugzilla bug, comment in #webkit.
 2.  15 minutes after the break or 10 minutes after the comment,  
with

 no reply from the breaker, roll out the patch.

Sounds great. Is this going to be a new page on webkit.org?

Agree it sounds like a good plan.



So, is the assumption that everyone reads bugmail immediately? When  
pinged on #webkit, I get an audible notification, but it's likely  
that I won't see bugmail until much later.


I suspect the odds of most people reading bugmail within 10 minutes  
are pretty low.


Regards,
Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Maciej Stachowiak


On Feb 26, 2010, at 9:56 AM, Alexey Proskuryakov wrote:



On 26.02.2010, at 9:29, Maciej Stachowiak wrote:

I'd like it if we had an IRC bot that announced build breakage on  
#webkit.



Perhaps better yet, on #webkit-build, as buildbot used to do.


In the past, no one ever joined #webkit-build so this was not an  
effective means of notification.


Regards,
Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Jeremy Orlow
On Fri, Feb 26, 2010 at 7:06 PM, Maciej Stachowiak m...@apple.com wrote:


 On Feb 26, 2010, at 9:56 AM, Alexey Proskuryakov wrote:


 On 26.02.2010, at 9:29, Maciej Stachowiak wrote:

  I'd like it if we had an IRC bot that announced build breakage on
 #webkit.



 Perhaps better yet, on #webkit-build, as buildbot used to do.


 In the past, no one ever joined #webkit-build so this was not an effective
 means of notification.


I didn't even know it existed until now.  Was there ever an email sent out
on this?  If so, I missed it.
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Geoffrey Garen
I think it would be more productive to start with better systems for informing 
people that they've broken something, and move on to rolling out patches 
aggressively if informing people doesn't work.

It's not surprising that people neglect a red tree when they don't know about 
it.

A lot of the proposals on this thread would interfere with this work flow:

1. Finish patch and get it working on local machine.
2. Check in, automatically test for compatibility on other machines and OS's in 
parallel, resolving unexpected problems as they arise.

and change it to this work flow:

0. Purchase and set up about 15 different build environments.
1. Finish patch and get it working on local machine.
2. Manually test on build environments purchased and set up in (0).
3. Check in.

That would be a serious blow to productivity -- probably a cure that is worse 
than the disease.

Bear in mind that the build environments problem is multiplied by Google's 
choice to use a separate JavaScript engine, which effectively almost doubles 
the testing surface area.

Geoff

On Feb 26, 2010, at 9:44 AM, Eric Seidel wrote:

 The bots take 15 minutes to cycle.  The moment the build is broken,
 thats FIX_TIME + BOT_CYCLE_TIME until their green again.
 
 I think we should cap the fix grace period at something like 15
 minutes, that means no more than 30 minutes of tree redness per break.
 That might be too aggressive to start with for WebKit, but I think we
 should move towards that.
 
 I would re-write rule one as something like this:
 1.  Comment in the bugzilla bug when the build breaks.  If there is no
 bugzilla bug, comment in #webkit.
 2.  15 minutes after the break or 10 minutes after the comment, with
 no reply from the breaker, roll out the patch.
 
 -eric
 
 On Fri, Feb 26, 2010 at 9:32 AM, Nikolas Zimmermann
 zimmerm...@physik.rwth-aachen.de wrote:
 
 Am 26.02.2010 um 18:17 schrieb Dimitri Glazkov:
 
 To summarize the thread:
 
 1) We're adopting when in doubt, roll it out approach to patches
 that turn tree red.
 2) Need to find a way to run Mac-EWS for non-committers.
 3) Enable build-break emails to webkit-dev or another opt-in mailing
 list
 
 What else?
 
 I'm a bit scared of rule 1. How about we define a minimum delay when to
 roll-out patches, after they break something?
 Let's say, if a commit breaks the tree, give the commiter a time frame of 30
 minutes to fix it - otherwhise roll-out (we could even automate that.)
 
 Example: When landing a SVG patch, that worked fine on Leopard, but broke
 Snow Leopard, I'd like to have some time to identify wheter it's the
 fault of my patch, or a platform specific problem. If it's the fault of my
 patch, I have no problem with reverting. But if I can't immediately fix the
 problem, because it's a platform specific issue, which can not be fixed (in
 terms of WebKit), then adding to the Skipped list, and filing a new bug
 just takes 5 minutes. Reverting the whole patch, just to reland it with a
 Skipped list addition is a bit too much work for me.
 
 What do others think?
 
 Cheers,
 Niko
 
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
 
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Enrica Casucci
I didn't know that failing tests would block the commit queue. I saw they were 
failing yesterday afternoon and I thought it was ok to wait until this morning 
to fix them.
My apologies for the inconvenience.
I believe a reasonable approach to handle these situations is to try to contact 
the person responsible for braking the tests in IRC and if there is no response 
within an hour, roll back.
I believe that requiring everyone to run the layout tests (the entire suite) 
before committing is the right thing to do.
The only time I haven't done it was yesterday :-(.
Lesson learned.

Enrica

On Feb 26, 2010, at 10:15 AM, Jeremy Orlow wrote:

 On Fri, Feb 26, 2010 at 7:06 PM, Maciej Stachowiak m...@apple.com wrote:
 
 On Feb 26, 2010, at 9:56 AM, Alexey Proskuryakov wrote:
 
 
 On 26.02.2010, at 9:29, Maciej Stachowiak wrote:
 
 I'd like it if we had an IRC bot that announced build breakage on #webkit.
 
 
 Perhaps better yet, on #webkit-build, as buildbot used to do.
 
 In the past, no one ever joined #webkit-build so this was not an effective 
 means of notification.
 
 I didn't even know it existed until now.  Was there ever an email sent out on 
 this?  If so, I missed it.
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Alexey Proskuryakov


On 26.02.2010, at 10:15, Jeremy Orlow wrote:

I didn't even know it existed until now.  Was there ever an email  
sent out on this?  If so, I missed it.



Buildbot used to announce results there, but it was a few years ago.  
My recollection is that when it worked, about half of active  
committers actually joined the channel. I still do, because I'm too  
lazy to remove it from my auto-connect list :)


Buildbot was also listening to commands on this channel, which I think  
worked as of several months ago. But it also no longer works, too.


- WBR, Alexey Proskuryakov

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Ojan Vafai
On Fri, Feb 26, 2010 at 10:40 AM, Geoffrey Garen gga...@apple.com wrote:

 A lot of the proposals on this thread would interfere with this work flow:

 1. Finish patch and get it working on local machine.
 2. Check in, automatically test for compatibility on other machines and
 OS's in parallel, resolving unexpected problems as they arise.


There is a non-trivial cost of this workflow on the rest of the team.
-keeps the commit-queue from running
-often results in new test failures going unnoticed because the tree is
already red
-we can't generally trust that all the tests should pass locally

Clearly, every developer having access to every environment and knowing how
to setup/build/test on each environment is not an option.

Would it be enough for you if you could send a patch to the EWS and get back
the results for any test failures? This would let you maintain the above
workflow without actually committing. Adam/Eric, how close is the EWS to
enabling that? The missing pieces as I see it are:

1. Running the layout tests as part of the EWS.
2. Giving access to the results of any failing tests.

and change it to this work flow:

 0. Purchase and set up about 15 different build environments.
 1. Finish patch and get it working on local machine.
 2. Manually test on build environments purchased and set up in (0).
 3. Check in.

 That would be a serious blow to productivity -- probably a cure that is
 worse than the disease.

 Bear in mind that the build environments problem is multiplied by Google's
 choice to use a separate JavaScript engine, which effectively almost doubles
 the testing surface area.

 Geoff

 On Feb 26, 2010, at 9:44 AM, Eric Seidel wrote:

  The bots take 15 minutes to cycle.  The moment the build is broken,
  thats FIX_TIME + BOT_CYCLE_TIME until their green again.
 
  I think we should cap the fix grace period at something like 15
  minutes, that means no more than 30 minutes of tree redness per break.
  That might be too aggressive to start with for WebKit, but I think we
  should move towards that.
 
  I would re-write rule one as something like this:
  1.  Comment in the bugzilla bug when the build breaks.  If there is no
  bugzilla bug, comment in #webkit.
  2.  15 minutes after the break or 10 minutes after the comment, with
  no reply from the breaker, roll out the patch.
 
  -eric
 
  On Fri, Feb 26, 2010 at 9:32 AM, Nikolas Zimmermann
  zimmerm...@physik.rwth-aachen.de wrote:
 
  Am 26.02.2010 um 18:17 schrieb Dimitri Glazkov:
 
  To summarize the thread:
 
  1) We're adopting when in doubt, roll it out approach to patches
  that turn tree red.
  2) Need to find a way to run Mac-EWS for non-committers.
  3) Enable build-break emails to webkit-dev or another opt-in mailing
  list
 
  What else?
 
  I'm a bit scared of rule 1. How about we define a minimum delay when to
  roll-out patches, after they break something?
  Let's say, if a commit breaks the tree, give the commiter a time frame
 of 30
  minutes to fix it - otherwhise roll-out (we could even automate that.)
 
  Example: When landing a SVG patch, that worked fine on Leopard, but
 broke
  Snow Leopard, I'd like to have some time to identify wheter it's the
  fault of my patch, or a platform specific problem. If it's the fault of
 my
  patch, I have no problem with reverting. But if I can't immediately fix
 the
  problem, because it's a platform specific issue, which can not be fixed
 (in
  terms of WebKit), then adding to the Skipped list, and filing a new bug
  just takes 5 minutes. Reverting the whole patch, just to reland it with
 a
  Skipped list addition is a bit too much work for me.
 
  What do others think?
 
  Cheers,
  Niko
 
  ___
  webkit-dev mailing list
  webkit-dev@lists.webkit.org
  http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
 
  ___
  webkit-dev mailing list
  webkit-dev@lists.webkit.org
  http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Geoffrey Garen
 There is a non-trivial cost of this workflow on the rest of the team.
 -keeps the commit-queue from running
 -often results in new test failures going unnoticed because the tree is 
 already red
 -we can't generally trust that all the tests should pass locally

I think all of the costs you list fundamentally derive from failures going 
unnoticed. That's the rationale for my suggestion that we start by making sure 
that failures are noticed.

 Would it be enough for you if you could send a patch to the EWS and get back 
 the results for any test failures?

It would certainly be very helpful.

I don't know if it would be enough to make me think a harsh policy of rolling 
out patches was a good idea.

But if we had a good system for making failures noticed, and a working EWS, and 
we still had problems with a red tree, I'm sure I would support some further 
action to solve the problem.

Geoff
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Maciej Stachowiak


On Feb 26, 2010, at 11:08 AM, Alexey Proskuryakov wrote:



On 26.02.2010, at 10:15, Jeremy Orlow wrote:

I didn't even know it existed until now.  Was there ever an email  
sent out on this?  If so, I missed it.



Buildbot used to announce results there, but it was a few years ago.  
My recollection is that when it worked, about half of active  
committers actually joined the channel. I still do, because I'm too  
lazy to remove it from my auto-connect list :)


Buildbot was also listening to commands on this channel, which I  
think worked as of several months ago. But it also no longer works,  
too.


I believe it announced successes as well as failures, which somewhat  
limited the utility. I think notice only of failures (or returning to  
green after previous failures), plus mention of the blameworthy  
committer's IRC nick, would make a much better notification system.


Regards,
Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Maciej Stachowiak


On Feb 26, 2010, at 11:34 AM, Geoffrey Garen wrote:


There is a non-trivial cost of this workflow on the rest of the team.
-keeps the commit-queue from running
-often results in new test failures going unnoticed because the  
tree is already red

-we can't generally trust that all the tests should pass locally


I think all of the costs you list fundamentally derive from  
failures going unnoticed. That's the rationale for my suggestion  
that we start by making sure that failures are noticed.


I strongly agree with Geoff that our first step should be to make  
failures more visible.


But if we had a good system for making failures noticed, and a  
working EWS, and we still had problems with a red tree, I'm sure I  
would support some further action to solve the problem.


I agree with this as well.

One goal I have always had for the WebKit project is to have the  
minimum amount of policy necessary for the project to run smoothly. It  
seems good to me that we have less in the way of rules and bureaucracy  
than other open source projects of a similar scale. As the project  
grows, we will certainly need some additional policy. But I would  
prefer to take it in steps. It seems to me like making failures more  
visible would go a long way.



Regards,
Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Simon Fraser
On Feb 26, 2010, at 11:34 AM, Geoffrey Garen wrote:

 There is a non-trivial cost of this workflow on the rest of the team.
 -keeps the commit-queue from running
 -often results in new test failures going unnoticed because the tree is 
 already red
 -we can't generally trust that all the tests should pass locally
 
 I think all of the costs you list fundamentally derive from failures going 
 unnoticed. That's the rationale for my suggestion that we start by making 
 sure that failures are noticed.
 
 Would it be enough for you if you could send a patch to the EWS and get back 
 the results for any test failures?
 
 It would certainly be very helpful.
 
 I don't know if it would be enough to make me think a harsh policy of rolling 
 out patches was a good idea.
 
 But if we had a good system for making failures noticed, and a working EWS, 
 and we still had problems with a red tree, I'm sure I would support some 
 further action to solve the problem.

Mozilla has (or at least had when I worked there) two additional tree rules 
that helped keep the tree green:

1. A sheriff was appointed at all times, and had the authority to close the 
tree if there was significant build or test breakage. Closing the tree meant 
that it was blocked to new commits other than those intended to fix problems. 
Closing the tree also sends a strong message that something is broken, please 
pitch in and fix it if you can.

Sheriff duties were shared around between responsible committers, so as not to 
overly burden one person.

2. The Mozilla tinderbox page (their buildbot waterfall) had a way for people 
to leave comments, by adding a star to a particular build with a comment. 
This is used as a way to communicate that someone has noticed the breakage, and 
is working on it.

In general, I think the waterfall page could be improved in order to make 
breakage archeology easier. Entries in the Changes column should be direct 
links to trac changesets, for example.

Simon

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Maciej Stachowiak


On Feb 26, 2010, at 11:43 AM, Simon Fraser wrote:



Mozilla has (or at least had when I worked there) two additional  
tree rules that helped keep the tree green:


1. A sheriff was appointed at all times, and had the authority to  
close the tree if there was significant build or test breakage.  
Closing the tree meant that it was blocked to new commits other than  
those intended to fix problems. Closing the tree also sends a strong  
message that something is broken, please pitch in and fix it if you  
can.


Sheriff duties were shared around between responsible committers, so  
as not to overly burden one person.


I think the build sheriff idea is a good one. Maybe what we want is to  
have a sheriff responsible for each build train that has an active  
buildbot. (It could be the same person responsible for several build  
trains, the main qualification would be having reasonable familiarity  
with a port and access to its build environment.)


However, I am not so sure close the tree is necessarily the best  
focus for sheriff actions. What I'd prefer to see is that the sheriff  
the person primarily responsible for reverting broken patches if not  
fixed in a timely manner. Then we could have some human judgment in  
the process and specific people with clear responsibility.


2. The Mozilla tinderbox page (their buildbot waterfall) had a way  
for people to leave comments, by adding a star to a particular  
build with a comment. This is used as a way to communicate that  
someone has noticed the breakage, and is working on it.


Sounds like a good idea. Wondering if that fits better in the console  
view or the extensions view.




In general, I think the waterfall page could be improved in order to  
make breakage archeology easier. Entries in the Changes column  
should be direct links to trac changesets, for example.


That sounds good too. Another thing that would help is adding next  
page links to the console view, like we have on the waterfall. The  
console link often makes it easier to quickly identify the patch that  
went bad, but only if the badness is recent enough to show up.


Regards,
Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Jeremy Orlow
On Fri, Feb 26, 2010 at 8:53 PM, Maciej Stachowiak m...@apple.com wrote:


 On Feb 26, 2010, at 11:43 AM, Simon Fraser wrote:


 Mozilla has (or at least had when I worked there) two additional tree
 rules that helped keep the tree green:

 1. A sheriff was appointed at all times, and had the authority to close
 the tree if there was significant build or test breakage. Closing the tree
 meant that it was blocked to new commits other than those intended to fix
 problems. Closing the tree also sends a strong message that something is
 broken, please pitch in and fix it if you can.

 Sheriff duties were shared around between responsible committers, so as
 not to overly burden one person.


 I think the build sheriff idea is a good one. Maybe what we want is to have
 a sheriff responsible for each build train that has an active buildbot. (It
 could be the same person responsible for several build trains, the main
 qualification would be having reasonable familiarity with a port and access
 to its build environment.)

 However, I am not so sure close the tree is necessarily the best focus
 for sheriff actions. What I'd prefer to see is that the sheriff the person
 primarily responsible for reverting broken patches if not fixed in a timely
 manner. Then we could have some human judgment in the process and specific
 people with clear responsibility.


I agree close to the tree is not necessary for the reasons you listed.
 And I think most people from the Chromium would welcome this change
(sheriff + ability to close).  We've been advocating it for some time now.
 :-)


  2. The Mozilla tinderbox page (their buildbot waterfall) had a way for
 people to leave comments, by adding a star to a particular build with a
 comment. This is used as a way to communicate that someone has noticed the
 breakage, and is working on it.


 Sounds like a good idea. Wondering if that fits better in the console view
 or the extensions view.



 In general, I think the waterfall page could be improved in order to make
 breakage archeology easier. Entries in the Changes column should be direct
 links to trac changesets, for example.


 That sounds good too. Another thing that would help is adding next page
 links to the console view, like we have on the waterfall. The console link
 often makes it easier to quickly identify the patch that went bad, but only
 if the badness is recent enough to show up.

 Regards,
 Maciej


 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread William Siegrist
On Feb 26, 2010, at 9:29 AM, Maciej Stachowiak wrote:

 
 On Feb 26, 2010, at 9:17 AM, Dimitri Glazkov wrote:
 
 To summarize the thread:
 
 1) We're adopting when in doubt, roll it out approach to patches
 that turn tree red.
 
 I think it's polite, though not mandatory, to make a reasonable effort to 
 find the person responsible for the breakage and give them a chance to fix 
 it. (This doesn't have to mean hunting around for hours or days, but you 
 could send email or ask on IRC.) Also acceptable to fix it yourself, if it is 
 obvious how.
 
 2) Need to find a way to run Mac-EWS for non-committers.
 3) Enable build-break emails to webkit-dev or another opt-in mailing list
 
 What else?
 
 I'd like it if we had an IRC bot that announced build breakage on #webkit.
 


The buildbot master lives on hardware that cannot host IRC bots, at least by 
default. I'd rather the bot be external to the master, but if you really need a 
bot on that hardware, I can start the request process. 

-Bill
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Jeremy Orlow
On Fri, Feb 26, 2010 at 9:00 PM, Jeremy Orlow jor...@chromium.org wrote:

 On Fri, Feb 26, 2010 at 8:53 PM, Maciej Stachowiak m...@apple.com wrote:


 On Feb 26, 2010, at 11:43 AM, Simon Fraser wrote:


 Mozilla has (or at least had when I worked there) two additional tree
 rules that helped keep the tree green:

 1. A sheriff was appointed at all times, and had the authority to close
 the tree if there was significant build or test breakage. Closing the tree
 meant that it was blocked to new commits other than those intended to fix
 problems. Closing the tree also sends a strong message that something is
 broken, please pitch in and fix it if you can.

 Sheriff duties were shared around between responsible committers, so as
 not to overly burden one person.


 I think the build sheriff idea is a good one. Maybe what we want is to
 have a sheriff responsible for each build train that has an active buildbot.
 (It could be the same person responsible for several build trains, the main
 qualification would be having reasonable familiarity with a port and access
 to its build environment.)

 However, I am not so sure close the tree is necessarily the best focus
 for sheriff actions. What I'd prefer to see is that the sheriff the person
 primarily responsible for reverting broken patches if not fixed in a timely
 manner. Then we could have some human judgment in the process and specific
 people with clear responsibility.


 I agree close to the tree is not necessary for the reasons you listed.
  And I think most people from the Chromium would welcome this change
 (sheriff + ability to close).  We've been advocating it for some time now.
  :-)


OopsI completely misread what you said.

The reason why being able to close the tree is important is because
sometimes it can take a while to sort out what caused what failures.  And
it's important not to allow more breakage in the mean time.  In Chromium, we
often have a good deal of redness, but as long as the sheriffs feel as
though they're on top of it, the tree stays open.  Now, I'll admit that we
have many more long running bots (like memory leak bots) and so these kinds
of train wrecks that require sorting happen way less in WebKit, but it still
might be nice to have the ability when necessary.

The suggestion below (2) about notes on the waterfall sounds great, but we
do OK by abusing the tree is closed/open string to keep track of other
state (like who's working on what fix).  We've found this works good
enough.  And maybe some informal banner like this would be good enough for
the first rev, unless we thought per CL annotations would be easy to
implement.

I'll note that in the Chromium project, we've had a very strong keep the
tree green ethic for some time now.  And we have a good deal of experience
related to it.  Certainly there are multiple ways to solve various problems,
but it might be worth taking a look at how we do things to see if there are
other parts of how we do things that might be of interest.


  2. The Mozilla tinderbox page (their buildbot waterfall) had a way for
 people to leave comments, by adding a star to a particular build with a
 comment. This is used as a way to communicate that someone has noticed the
 breakage, and is working on it.


 Sounds like a good idea. Wondering if that fits better in the console view
 or the extensions view.



 In general, I think the waterfall page could be improved in order to make
 breakage archeology easier. Entries in the Changes column should be direct
 links to trac changesets, for example.


 That sounds good too. Another thing that would help is adding next page
 links to the console view, like we have on the waterfall. The console link
 often makes it easier to quickly identify the patch that went bad, but only
 if the badness is recent enough to show up.

 Regards,
 Maciej


 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev



___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Ojan Vafai
On Fri, Feb 26, 2010 at 11:53 AM, Maciej Stachowiak m...@apple.com wrote:

 On Feb 26, 2010, at 11:43 AM, Simon Fraser wrote:

2. The Mozilla tinderbox page (their buildbot waterfall) had a way for
 people to leave comments, by adding a star to a particular build with a
 comment. This is used as a way to communicate that someone has noticed the
 breakage, and is working on it.

 Sounds like a good idea. Wondering if that fits better in the console view
 or the extensions view.


Another, perhaps easier to implement, approach would be to have a single
status message that is iframed at the top of the waterfall and console
pages. This has proven good enough for chromium. See the message at the top
of build.chromium.org.

http://chromium-status.appspot.com/current The status can then be updated
at http://chromium-status.appspot.com/ (requires login...not sure why),
which also shows the last 25 statuses.

People use it in ways like 2 win release failures - ojan, mac compile -
dglazkov, qt failure - ??? to indicate that ojan/dglazkov are currently
actively fixing those and qt has a failure that needs an owner.

For the record, I fully support making warnings more visible and improving
the EWS/buildbot infrastructure before resorting to adding new policies.

On the topic of buildbot infrastructure, one problem I've had is the bots
sometimes get quite behind. I made a commit last week that took *hours*
before running the tests on the Windows bot. Sitting around for 30 minutes
to see the tree green after a commit is one thing, sitting around for 4
hours is another. Hopefully, running tests in parallel will resolve many of
these issues.
 http://chromium-status.appspot.com/current

Ojan
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] The tree is on fire: a tragedy of the commons

2010-02-26 Thread Maciej Stachowiak


On Feb 26, 2010, at 12:11 PM, William Siegrist wrote:


On Feb 26, 2010, at 9:29 AM, Maciej Stachowiak wrote:



On Feb 26, 2010, at 9:17 AM, Dimitri Glazkov wrote:


To summarize the thread:

1) We're adopting when in doubt, roll it out approach to patches
that turn tree red.


I think it's polite, though not mandatory, to make a reasonable  
effort to find the person responsible for the breakage and give  
them a chance to fix it. (This doesn't have to mean hunting around  
for hours or days, but you could send email or ask on IRC.) Also  
acceptable to fix it yourself, if it is obvious how.



2) Need to find a way to run Mac-EWS for non-committers.
3) Enable build-break emails to webkit-dev or another opt-in  
mailing list


What else?


I'd like it if we had an IRC bot that announced build breakage on  
#webkit.





The buildbot master lives on hardware that cannot host IRC bots, at  
least by default. I'd rather the bot be external to the master, but  
if you really need a bot on that hardware, I can start the request  
process.


As long as the master can notify whatever host is running the bot, it  
seems to me like it doesn't matter much if it needs to be the same  
hardware.


I'm not really up on the internal details of buildbot, so I am not  
sure what would be easier to implement.


Regards,
Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev