[webkit-dev] The tree is on fire: a tragedy of the commons
Not to point fingers, but we've been having trouble keeping build.webkit.org green these past few weeks. As I write this message, every platform is broken, again. As the project scales, polluting the build with brokenness impacts more developers and drains more productivity. Here are some approaches we could use to turn this tragedy of the commons around: 1) Adopt a rollout first, ask questions later ethic. The vast majority of changes are not important enough to break the build for everyone else. If we adopt a rollout first, ask questions later ethic, committers would feel free to rollout brokenness to unbreak the build and contributors shouldn't be offended if their patch is rolled out without their knowledge. We can always re-land the broken patch later once it actually works. 2) Require pre-commit vetting of patches. We have the resources to build and test every patch on at least one platform before landing the patch in the main tree. Vetting patches before landing will help us avoid breaking every platform at once. Once the patch has been vetted, it can either be landed mechanically (i.e., by commit-queue) or manually. Here's how I would design the life and times of a patch: 1) Contributor uploads patch and nominates it for review. 2) Patch vetted by the EWS on numerous platforms. 3) If the EWS finds a problem, return to step 1. 4) Reviewer marks patch review+. 5) Committer decides the patch is ready to land. 6) Patch built and tested against top-of-tree on at least one platform. 7) If the patch fail to build or pass tests, return to step 1. 8) Patch landed. 9) If the patch turns any of the core builders red, patch is rolled out, return to step 1. I suspect most of our brokenness coming from committers skipping steps 6 and 7. Adam ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Fri, Feb 26, 2010 at 10:36 AM, Adam Barth aba...@webkit.org wrote: Not to point fingers, but we've been having trouble keeping build.webkit.org green these past few weeks. As I write this message, every platform is broken, again. As the project scales, polluting the build with brokenness impacts more developers and drains more productivity. Here are some approaches we could use to turn this tragedy of the commons around: 1) Adopt a rollout first, ask questions later ethic. The vast majority of changes are not important enough to break the build for everyone else. If we adopt a rollout first, ask questions later ethic, committers would feel free to rollout brokenness to unbreak the build and contributors shouldn't be offended if their patch is rolled out without their knowledge. We can always re-land the broken patch later once it actually works. 2) Require pre-commit vetting of patches. We have the resources to build and test every patch on at least one platform before landing the patch in the main tree. Vetting patches before landing will help us avoid breaking every platform at once. Once the patch has been vetted, it can either be landed mechanically (i.e., by commit-queue) or manually. Here's how I would design the life and times of a patch: 1) Contributor uploads patch and nominates it for review. 2) Patch vetted by the EWS on numerous platforms. 3) If the EWS finds a problem, return to step 1. 4) Reviewer marks patch review+. 5) Committer decides the patch is ready to land. 6) Patch built and tested against top-of-tree on at least one platform. 7) If the patch fail to build or pass tests, return to step 1. 8) Patch landed. 9) If the patch turns any of the core builders red, patch is rolled out, return to step 1. I suspect most of our brokenness coming from committers skipping steps 6 and 7. LGTM. The only thing I'd add is that we REALLY need emails to start going out to webkit-dev (and ideally the suspected patch owners as well) when things do break. What is doing this blocked on? ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Fri, Feb 26, 2010 at 11:36 AM, Adam Barth aba...@webkit.org wrote: Not to point fingers, but we've been having trouble keeping build.webkit.org green these past few weeks. As I write this message, every platform is broken, again. As the project scales, polluting the build with brokenness impacts more developers and drains more productivity. Here are some approaches we could use to turn this tragedy of the commons around: 1) Adopt a rollout first, ask questions later ethic. The vast majority of changes are not important enough to break the build for everyone else. If we adopt a rollout first, ask questions later ethic, committers would feel free to rollout brokenness to unbreak the build and contributors shouldn't be offended if their patch is rolled out without their knowledge. We can always re-land the broken patch later once it actually works. In my experience this is more or less the current policy, especially for build breakage (as opposed to test breakage). Maybe a bit less hardliner in that we usually try contact the culprit and give some time to fix issues, but I think there's no remorse in rolling out patches if there's brokenness and nobody working on fixing it. 2) Require pre-commit vetting of patches. We have the resources to build and test every patch on at least one platform before landing the patch in the main tree. Vetting patches before landing will help us avoid breaking every platform at once. Once the patch has been vetted, it can either be landed mechanically (i.e., by commit-queue) or manually. Here's how I would design the life and times of a patch: 1) Contributor uploads patch and nominates it for review. 2) Patch vetted by the EWS on numerous platforms. 3) If the EWS finds a problem, return to step 1. 4) Reviewer marks patch review+. 5) Committer decides the patch is ready to land. 6) Patch built and tested against top-of-tree on at least one platform. 7) If the patch fail to build or pass tests, return to step 1. 8) Patch landed. 9) If the patch turns any of the core builders red, patch is rolled out, return to step 1. EWS has been a huge boon in productivity at least for us GTK+ folks, so I fully support any step to increase its awesomeness! Of course what we need to do is to work on increasing the number of core builders, but that's an orthogonal issue and our own responsibility. Cheers, Xan I suspect most of our brokenness coming from committers skipping steps 6 and 7. Adam ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
1) Contributor uploads patch and nominates it for review. 2) Patch vetted by the EWS on numerous platforms. When a non-committer uploads a patch, it is not being vet by EWS. I know that is due to security issues. It would be really nice with an option for a reviewer to accept it to run on the EWS. Kenneht ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Fri, Feb 26, 2010 at 4:14 AM, Kenneth Christiansen kenneth.christian...@openbossa.org wrote: 1) Contributor uploads patch and nominates it for review. 2) Patch vetted by the EWS on numerous platforms. When a non-committer uploads a patch, it is not being vet by EWS. I know that is due to security issues. It would be really nice with an option for a reviewer to accept it to run on the EWS. The only EWS which requires committer access is Mac-EWS. All other EWS bots will run any patch. -eric ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Fri, Feb 26, 2010 at 7:09 AM, Eric Seidel e...@webkit.org wrote: On Fri, Feb 26, 2010 at 4:14 AM, Kenneth Christiansen kenneth.christian...@openbossa.org wrote: 1) Contributor uploads patch and nominates it for review. 2) Patch vetted by the EWS on numerous platforms. When a non-committer uploads a patch, it is not being vet by EWS. I know that is due to security issues. It would be really nice with an option for a reviewer to accept it to run on the EWS. The only EWS which requires committer access is Mac-EWS. All other EWS bots will run any patch. Why is that? That's the platform I'm most interested in see run. -- --Alex Milowski The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered. Bertrand Russell in a footnote of Principles of Mathematics ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Fri, Feb 26, 2010 at 7:12 AM, Alex Milowski a...@milowski.com wrote: The only EWS which requires committer access is Mac-EWS. All other EWS bots will run any patch. Why is that? That's the platform I'm most interested in see run. Various reasons. Mostly due to our current hardware setup. If someone has some mac hardware they'd like to donate to the cause it would be most welcome. -eric ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Fri, Feb 26, 2010 at 7:17 AM, Eric Seidel e...@webkit.org wrote: On Fri, Feb 26, 2010 at 7:12 AM, Alex Milowski a...@milowski.com wrote: The only EWS which requires committer access is Mac-EWS. All other EWS bots will run any patch. Why is that? That's the platform I'm most interested in see run. Various reasons. Mostly due to our current hardware setup. If someone has some mac hardware they'd like to donate to the cause it would be most welcome. That seems really, really solvable. -- --Alex Milowski The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered. Bertrand Russell in a footnote of Principles of Mathematics ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Fri, Feb 26, 2010 at 7:24 AM, Alex Milowski a...@milowski.com wrote: On Fri, Feb 26, 2010 at 7:17 AM, Eric Seidel e...@webkit.org wrote: On Fri, Feb 26, 2010 at 7:12 AM, Alex Milowski a...@milowski.com wrote: The only EWS which requires committer access is Mac-EWS. All other EWS bots will run any patch. Why is that? That's the platform I'm most interested in see run. Various reasons. Mostly due to our current hardware setup. If someone has some mac hardware they'd like to donate to the cause it would be most welcome. That seems really, really solvable. The core issue here is that the license for Mac OS X prevents us from running the OS in a virtual machine. The way we protect ourselves from random folks haxoring the EWS on Linux is by running them on EC2 and re-imagining the machines periodically. If you'd like to donate hardware that you're willing to have random folks run code on, please let me or Eric know and we'll show you how to get the mac-ews up and running. Adam ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Fri, Feb 26, 2010 at 8:47 AM, Alex Milowski a...@milowski.com wrote: On Fri, Feb 26, 2010 at 8:19 AM, Adam Barth aba...@webkit.org wrote: On Fri, Feb 26, 2010 at 7:24 AM, Alex Milowski a...@milowski.com wrote: On Fri, Feb 26, 2010 at 7:17 AM, Eric Seidel e...@webkit.org wrote: On Fri, Feb 26, 2010 at 7:12 AM, Alex Milowski a...@milowski.com wrote: The only EWS which requires committer access is Mac-EWS. All other EWS bots will run any patch. Why is that? That's the platform I'm most interested in see run. Various reasons. Mostly due to our current hardware setup. If someone has some mac hardware they'd like to donate to the cause it would be most welcome. That seems really, really solvable. The core issue here is that the license for Mac OS X prevents us from running the OS in a virtual machine. The way we protect ourselves from random folks haxoring the EWS on Linux is by running them on EC2 and re-imagining the machines periodically. So, it is possible to run Mac OS X on a virtual machine: Oh, awesome! The real issue is you can't run this in the cloud like on an EC2 server because of the hardware restriction in Apple's license, right? EC2 has support for Linux and Windows, but not Mac. I have been meaning to set up a Windows box, but I haven't gotten around to it yet. If you know of a cloud provider that has Mac, we can set up the mac-ews there. If you'd like to donate hardware that you're willing to have random folks run code on, please let me or Eric know and we'll show you how to get the mac-ews up and running. I have limited bandwidth where I'm at and so hosting something, while possible, needs careful consideration. I've contemplated running something like EWS for my own work so I'd be interested in learning how this work. Amazon tells me that our current bots use about 4 GB/month of download bandwidth and 600 MB/month of upload bandwidth. I presume almost all of the bandwidth is to update the working copies of the four bots hosted there. ...but will just one server out there somewhere solve this problem? Don't we need several? It depends on how beefy your server it, but one server is probably fine. The current mac-ews is running on one machine and has no trouble keeping up with the load. Adam ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Fri, Feb 26, 2010 at 8:55 AM, Adam Barth aba...@webkit.org wrote: Amazon tells me that our current bots use about 4 GB/month of download bandwidth and 600 MB/month of upload bandwidth. I presume almost all of the bandwidth is to update the working copies of the four bots hosted there. In case you're curious, Amazon charges us 9 cents/month for that much bandwidth. Adam ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
That is some of the best 9 cents spend ever! Kenneth On Fri, Feb 26, 2010 at 1:58 PM, Adam Barth aba...@webkit.org wrote: On Fri, Feb 26, 2010 at 8:55 AM, Adam Barth aba...@webkit.org wrote: Amazon tells me that our current bots use about 4 GB/month of download bandwidth and 600 MB/month of upload bandwidth. I presume almost all of the bandwidth is to update the working copies of the four bots hosted there. In case you're curious, Amazon charges us 9 cents/month for that much bandwidth. Adam ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev -- Kenneth Rohde Christiansen Technical Lead / Senior Software Engineer Qt Labs Americas, Nokia Technology Institute, INdT Phone +55 81 8895 6002 / E-mail kenneth.christiansen at openbossa.org ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
Well, the total bill is a bit bigger, but yeah. :) Adam On Fri, Feb 26, 2010 at 9:05 AM, Kenneth Christiansen kenneth.christian...@openbossa.org wrote: That is some of the best 9 cents spend ever! Kenneth On Fri, Feb 26, 2010 at 1:58 PM, Adam Barth aba...@webkit.org wrote: On Fri, Feb 26, 2010 at 8:55 AM, Adam Barth aba...@webkit.org wrote: Amazon tells me that our current bots use about 4 GB/month of download bandwidth and 600 MB/month of upload bandwidth. I presume almost all of the bandwidth is to update the working copies of the four bots hosted there. In case you're curious, Amazon charges us 9 cents/month for that much bandwidth. Adam ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
To summarize the thread: 1) We're adopting when in doubt, roll it out approach to patches that turn tree red. 2) Need to find a way to run Mac-EWS for non-committers. 3) Enable build-break emails to webkit-dev or another opt-in mailing list What else? :DG On Fri, Feb 26, 2010 at 9:08 AM, Adam Barth aba...@webkit.org wrote: Well, the total bill is a bit bigger, but yeah. :) Adam On Fri, Feb 26, 2010 at 9:05 AM, Kenneth Christiansen kenneth.christian...@openbossa.org wrote: That is some of the best 9 cents spend ever! Kenneth On Fri, Feb 26, 2010 at 1:58 PM, Adam Barth aba...@webkit.org wrote: On Fri, Feb 26, 2010 at 8:55 AM, Adam Barth aba...@webkit.org wrote: Amazon tells me that our current bots use about 4 GB/month of download bandwidth and 600 MB/month of upload bandwidth. I presume almost all of the bandwidth is to update the working copies of the four bots hosted there. In case you're curious, Amazon charges us 9 cents/month for that much bandwidth. Adam ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Feb 26, 2010, at 9:17 AM, Dimitri Glazkov wrote: To summarize the thread: 1) We're adopting when in doubt, roll it out approach to patches that turn tree red. I think it's polite, though not mandatory, to make a reasonable effort to find the person responsible for the breakage and give them a chance to fix it. (This doesn't have to mean hunting around for hours or days, but you could send email or ask on IRC.) Also acceptable to fix it yourself, if it is obvious how. 2) Need to find a way to run Mac-EWS for non-committers. 3) Enable build-break emails to webkit-dev or another opt-in mailing list What else? I'd like it if we had an IRC bot that announced build breakage on #webkit. Regards, Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
Am 26.02.2010 um 18:17 schrieb Dimitri Glazkov: To summarize the thread: 1) We're adopting when in doubt, roll it out approach to patches that turn tree red. 2) Need to find a way to run Mac-EWS for non-committers. 3) Enable build-break emails to webkit-dev or another opt-in mailing list What else? I'm a bit scared of rule 1. How about we define a minimum delay when to roll-out patches, after they break something? Let's say, if a commit breaks the tree, give the commiter a time frame of 30 minutes to fix it - otherwhise roll-out (we could even automate that.) Example: When landing a SVG patch, that worked fine on Leopard, but broke Snow Leopard, I'd like to have some time to identify wheter it's the fault of my patch, or a platform specific problem. If it's the fault of my patch, I have no problem with reverting. But if I can't immediately fix the problem, because it's a platform specific issue, which can not be fixed (in terms of WebKit), then adding to the Skipped list, and filing a new bug just takes 5 minutes. Reverting the whole patch, just to reland it with a Skipped list addition is a bit too much work for me. What do others think? Cheers, Niko ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Feb 26, 2010, at 1:36 AM, Adam Barth wrote: Not to point fingers, but we've been having trouble keeping build.webkit.org green these past few weeks. As I write this message, every platform is broken, again. As the project scales, polluting the build with brokenness impacts more developers and drains more productivity. Here are some approaches we could use to turn this tragedy of the commons around: 1) Adopt a rollout first, ask questions later ethic. The vast majority of changes are not important enough to break the build for everyone else. If we adopt a rollout first, ask questions later ethic, committers would feel free to rollout brokenness to unbreak the build and contributors shouldn't be offended if their patch is rolled out without their knowledge. We can always re-land the broken patch later once it actually works. 2) Require pre-commit vetting of patches. We have the resources to build and test every patch on at least one platform before landing the patch in the main tree. Vetting patches before landing will help us avoid breaking every platform at once. Once the patch has been vetted, it can either be landed mechanically (i.e., by commit-queue) or manually. Here's how I would design the life and times of a patch: 1) Contributor uploads patch and nominates it for review. 2) Patch vetted by the EWS on numerous platforms. 3) If the EWS finds a problem, return to step 1. 4) Reviewer marks patch review+. 5) Committer decides the patch is ready to land. 6) Patch built and tested against top-of-tree on at least one platform. 7) If the patch fail to build or pass tests, return to step 1. 8) Patch landed. 9) If the patch turns any of the core builders red, patch is rolled out, return to step 1. I suspect most of our brokenness coming from committers skipping steps 6 and 7. One data point: I broke the build this weekend, because I introduced a problem that affected debug builds but not release. I did a full release build on my own system before committing. When someone pointed out the breakage, I rolled the patch out myself until I could fix it. If the problems were such that I could fix them as quickly as rolling out, I would I feel like the biggest failure in my case was that I forgot to look at the bot once my patch went through a cycle. This is why I wish it would do some form of more active notification. Sometimes I get distracted after committing and forget to keep hitting reload on the buildbot page. Regards, Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Fri, Feb 26, 2010 at 1:36 AM, Adam Barth aba...@webkit.org wrote: 2) Require pre-commit vetting of patches. We have the resources to Here's how I would design the life and times of a patch: 1) Contributor uploads patch and nominates it for review. 2) Patch vetted by the EWS on numerous platforms. 3) If the EWS finds a problem, return to step 1. 4) Reviewer marks patch review+. It seems like this would preclude serial patches from getting reviewed together. If I break a larger patch into smaller pieces for the benefit of the reviewer (so that the second piece depends on the first getting committed, etc), it seems like this process would mean that the second piece can't get reviewed until the first piece is committed. It seems like the committer should be allowed to decide when (2) and (3) happen relative to the other steps -- provided it happens some time before landing. --Chris 5) Committer decides the patch is ready to land. 6) Patch built and tested against top-of-tree on at least one platform. 7) If the patch fail to build or pass tests, return to step 1. 8) Patch landed. 9) If the patch turns any of the core builders red, patch is rolled out, return to step 1. ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
The bots take 15 minutes to cycle. The moment the build is broken, thats FIX_TIME + BOT_CYCLE_TIME until their green again. I think we should cap the fix grace period at something like 15 minutes, that means no more than 30 minutes of tree redness per break. That might be too aggressive to start with for WebKit, but I think we should move towards that. I would re-write rule one as something like this: 1. Comment in the bugzilla bug when the build breaks. If there is no bugzilla bug, comment in #webkit. 2. 15 minutes after the break or 10 minutes after the comment, with no reply from the breaker, roll out the patch. -eric On Fri, Feb 26, 2010 at 9:32 AM, Nikolas Zimmermann zimmerm...@physik.rwth-aachen.de wrote: Am 26.02.2010 um 18:17 schrieb Dimitri Glazkov: To summarize the thread: 1) We're adopting when in doubt, roll it out approach to patches that turn tree red. 2) Need to find a way to run Mac-EWS for non-committers. 3) Enable build-break emails to webkit-dev or another opt-in mailing list What else? I'm a bit scared of rule 1. How about we define a minimum delay when to roll-out patches, after they break something? Let's say, if a commit breaks the tree, give the commiter a time frame of 30 minutes to fix it - otherwhise roll-out (we could even automate that.) Example: When landing a SVG patch, that worked fine on Leopard, but broke Snow Leopard, I'd like to have some time to identify wheter it's the fault of my patch, or a platform specific problem. If it's the fault of my patch, I have no problem with reverting. But if I can't immediately fix the problem, because it's a platform specific issue, which can not be fixed (in terms of WebKit), then adding to the Skipped list, and filing a new bug just takes 5 minutes. Reverting the whole patch, just to reland it with a Skipped list addition is a bit too much work for me. What do others think? Cheers, Niko ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Fri, Feb 26, 2010 at 8:55 AM, Adam Barth aba...@webkit.org wrote: On Fri, Feb 26, 2010 at 8:47 AM, Alex Milowski a...@milowski.com wrote: On Fri, Feb 26, 2010 at 8:19 AM, Adam Barth aba...@webkit.org wrote: On Fri, Feb 26, 2010 at 7:24 AM, Alex Milowski a...@milowski.com wrote: On Fri, Feb 26, 2010 at 7:17 AM, Eric Seidel e...@webkit.org wrote: On Fri, Feb 26, 2010 at 7:12 AM, Alex Milowski a...@milowski.com wrote: The only EWS which requires committer access is Mac-EWS. All other EWS bots will run any patch. Why is that? That's the platform I'm most interested in see run. Various reasons. Mostly due to our current hardware setup. If someone has some mac hardware they'd like to donate to the cause it would be most welcome. That seems really, really solvable. The core issue here is that the license for Mac OS X prevents us from running the OS in a virtual machine. The way we protect ourselves from random folks haxoring the EWS on Linux is by running them on EC2 and re-imagining the machines periodically. So, it is possible to run Mac OS X on a virtual machine: Oh, awesome! The real issue is you can't run this in the cloud like on an EC2 server because of the hardware restriction in Apple's license, right? EC2 has support for Linux and Windows, but not Mac. I have been meaning to set up a Windows box, but I haven't gotten around to it yet. If you know of a cloud provider that has Mac, we can set up the mac-ews there. The only non-dedicated server hosting provider I've found is GoDaddy: http://www.godaddy.com/hosting/mac-hosting.aspx I don't know if starting/stopping instances is as easy as Amazon's EC2 service (which I use). I've never used their virtual hosting service. -- --Alex Milowski The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered. Bertrand Russell in a footnote of Principles of Mathematics ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Fri, Feb 26, 2010 at 9:44 AM, Eric Seidel e...@webkit.org wrote: The bots take 15 minutes to cycle. The moment the build is broken, thats FIX_TIME + BOT_CYCLE_TIME until their green again. I think we should cap the fix grace period at something like 15 minutes, that means no more than 30 minutes of tree redness per break. That might be too aggressive to start with for WebKit, but I think we should move towards that. I would re-write rule one as something like this: 1. Comment in the bugzilla bug when the build breaks. If there is no bugzilla bug, comment in #webkit. 2. 15 minutes after the break or 10 minutes after the comment, with no reply from the breaker, roll out the patch. Sounds great. Is this going to be a new page on webkit.org? :DG -eric On Fri, Feb 26, 2010 at 9:32 AM, Nikolas Zimmermann zimmerm...@physik.rwth-aachen.de wrote: Am 26.02.2010 um 18:17 schrieb Dimitri Glazkov: To summarize the thread: 1) We're adopting when in doubt, roll it out approach to patches that turn tree red. 2) Need to find a way to run Mac-EWS for non-committers. 3) Enable build-break emails to webkit-dev or another opt-in mailing list What else? I'm a bit scared of rule 1. How about we define a minimum delay when to roll-out patches, after they break something? Let's say, if a commit breaks the tree, give the commiter a time frame of 30 minutes to fix it - otherwhise roll-out (we could even automate that.) Example: When landing a SVG patch, that worked fine on Leopard, but broke Snow Leopard, I'd like to have some time to identify wheter it's the fault of my patch, or a platform specific problem. If it's the fault of my patch, I have no problem with reverting. But if I can't immediately fix the problem, because it's a platform specific issue, which can not be fixed (in terms of WebKit), then adding to the Skipped list, and filing a new bug just takes 5 minutes. Reverting the whole patch, just to reland it with a Skipped list addition is a bit too much work for me. What do others think? Cheers, Niko ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Fri, Feb 26, 2010 at 6:47 PM, Dimitri Glazkov dglaz...@chromium.orgwrote: On Fri, Feb 26, 2010 at 9:44 AM, Eric Seidel e...@webkit.org wrote: The bots take 15 minutes to cycle. The moment the build is broken, thats FIX_TIME + BOT_CYCLE_TIME until their green again. I think we should cap the fix grace period at something like 15 minutes, that means no more than 30 minutes of tree redness per break. That might be too aggressive to start with for WebKit, but I think we should move towards that. I would re-write rule one as something like this: 1. Comment in the bugzilla bug when the build breaks. If there is no bugzilla bug, comment in #webkit. 2. 15 minutes after the break or 10 minutes after the comment, with no reply from the breaker, roll out the patch. Sounds great. Is this going to be a new page on webkit.org? Agree it sounds like a good plan. Re the emails: who knows how to do that? Can someone own this process to completion and do it as soon as possible? It'd be much appreciated! :DG -eric On Fri, Feb 26, 2010 at 9:32 AM, Nikolas Zimmermann zimmerm...@physik.rwth-aachen.de wrote: Am 26.02.2010 um 18:17 schrieb Dimitri Glazkov: To summarize the thread: 1) We're adopting when in doubt, roll it out approach to patches that turn tree red. 2) Need to find a way to run Mac-EWS for non-committers. 3) Enable build-break emails to webkit-dev or another opt-in mailing list What else? I'm a bit scared of rule 1. How about we define a minimum delay when to roll-out patches, after they break something? Let's say, if a commit breaks the tree, give the commiter a time frame of 30 minutes to fix it - otherwhise roll-out (we could even automate that.) Example: When landing a SVG patch, that worked fine on Leopard, but broke Snow Leopard, I'd like to have some time to identify wheter it's the fault of my patch, or a platform specific problem. If it's the fault of my patch, I have no problem with reverting. But if I can't immediately fix the problem, because it's a platform specific issue, which can not be fixed (in terms of WebKit), then adding to the Skipped list, and filing a new bug just takes 5 minutes. Reverting the whole patch, just to reland it with a Skipped list addition is a bit too much work for me. What do others think? Cheers, Niko ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On 26.02.2010, at 9:29, Maciej Stachowiak wrote: I'd like it if we had an IRC bot that announced build breakage on #webkit. Perhaps better yet, on #webkit-build, as buildbot used to do. - WBR, Alexey Proskuryakov ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Feb 26, 2010, at 9:58 AM, Alexey Proskuryakov wrote: On 26.02.2010, at 9:50, Jeremy Orlow wrote: I would re-write rule one as something like this: 1. Comment in the bugzilla bug when the build breaks. If there is no bugzilla bug, comment in #webkit. 2. 15 minutes after the break or 10 minutes after the comment, with no reply from the breaker, roll out the patch. Sounds great. Is this going to be a new page on webkit.org? Agree it sounds like a good plan. So, is the assumption that everyone reads bugmail immediately? When pinged on #webkit, I get an audible notification, but it's likely that I won't see bugmail until much later. I suspect the odds of most people reading bugmail within 10 minutes are pretty low. Regards, Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Feb 26, 2010, at 9:56 AM, Alexey Proskuryakov wrote: On 26.02.2010, at 9:29, Maciej Stachowiak wrote: I'd like it if we had an IRC bot that announced build breakage on #webkit. Perhaps better yet, on #webkit-build, as buildbot used to do. In the past, no one ever joined #webkit-build so this was not an effective means of notification. Regards, Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Fri, Feb 26, 2010 at 7:06 PM, Maciej Stachowiak m...@apple.com wrote: On Feb 26, 2010, at 9:56 AM, Alexey Proskuryakov wrote: On 26.02.2010, at 9:29, Maciej Stachowiak wrote: I'd like it if we had an IRC bot that announced build breakage on #webkit. Perhaps better yet, on #webkit-build, as buildbot used to do. In the past, no one ever joined #webkit-build so this was not an effective means of notification. I didn't even know it existed until now. Was there ever an email sent out on this? If so, I missed it. ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
I think it would be more productive to start with better systems for informing people that they've broken something, and move on to rolling out patches aggressively if informing people doesn't work. It's not surprising that people neglect a red tree when they don't know about it. A lot of the proposals on this thread would interfere with this work flow: 1. Finish patch and get it working on local machine. 2. Check in, automatically test for compatibility on other machines and OS's in parallel, resolving unexpected problems as they arise. and change it to this work flow: 0. Purchase and set up about 15 different build environments. 1. Finish patch and get it working on local machine. 2. Manually test on build environments purchased and set up in (0). 3. Check in. That would be a serious blow to productivity -- probably a cure that is worse than the disease. Bear in mind that the build environments problem is multiplied by Google's choice to use a separate JavaScript engine, which effectively almost doubles the testing surface area. Geoff On Feb 26, 2010, at 9:44 AM, Eric Seidel wrote: The bots take 15 minutes to cycle. The moment the build is broken, thats FIX_TIME + BOT_CYCLE_TIME until their green again. I think we should cap the fix grace period at something like 15 minutes, that means no more than 30 minutes of tree redness per break. That might be too aggressive to start with for WebKit, but I think we should move towards that. I would re-write rule one as something like this: 1. Comment in the bugzilla bug when the build breaks. If there is no bugzilla bug, comment in #webkit. 2. 15 minutes after the break or 10 minutes after the comment, with no reply from the breaker, roll out the patch. -eric On Fri, Feb 26, 2010 at 9:32 AM, Nikolas Zimmermann zimmerm...@physik.rwth-aachen.de wrote: Am 26.02.2010 um 18:17 schrieb Dimitri Glazkov: To summarize the thread: 1) We're adopting when in doubt, roll it out approach to patches that turn tree red. 2) Need to find a way to run Mac-EWS for non-committers. 3) Enable build-break emails to webkit-dev or another opt-in mailing list What else? I'm a bit scared of rule 1. How about we define a minimum delay when to roll-out patches, after they break something? Let's say, if a commit breaks the tree, give the commiter a time frame of 30 minutes to fix it - otherwhise roll-out (we could even automate that.) Example: When landing a SVG patch, that worked fine on Leopard, but broke Snow Leopard, I'd like to have some time to identify wheter it's the fault of my patch, or a platform specific problem. If it's the fault of my patch, I have no problem with reverting. But if I can't immediately fix the problem, because it's a platform specific issue, which can not be fixed (in terms of WebKit), then adding to the Skipped list, and filing a new bug just takes 5 minutes. Reverting the whole patch, just to reland it with a Skipped list addition is a bit too much work for me. What do others think? Cheers, Niko ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
I didn't know that failing tests would block the commit queue. I saw they were failing yesterday afternoon and I thought it was ok to wait until this morning to fix them. My apologies for the inconvenience. I believe a reasonable approach to handle these situations is to try to contact the person responsible for braking the tests in IRC and if there is no response within an hour, roll back. I believe that requiring everyone to run the layout tests (the entire suite) before committing is the right thing to do. The only time I haven't done it was yesterday :-(. Lesson learned. Enrica On Feb 26, 2010, at 10:15 AM, Jeremy Orlow wrote: On Fri, Feb 26, 2010 at 7:06 PM, Maciej Stachowiak m...@apple.com wrote: On Feb 26, 2010, at 9:56 AM, Alexey Proskuryakov wrote: On 26.02.2010, at 9:29, Maciej Stachowiak wrote: I'd like it if we had an IRC bot that announced build breakage on #webkit. Perhaps better yet, on #webkit-build, as buildbot used to do. In the past, no one ever joined #webkit-build so this was not an effective means of notification. I didn't even know it existed until now. Was there ever an email sent out on this? If so, I missed it. ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On 26.02.2010, at 10:15, Jeremy Orlow wrote: I didn't even know it existed until now. Was there ever an email sent out on this? If so, I missed it. Buildbot used to announce results there, but it was a few years ago. My recollection is that when it worked, about half of active committers actually joined the channel. I still do, because I'm too lazy to remove it from my auto-connect list :) Buildbot was also listening to commands on this channel, which I think worked as of several months ago. But it also no longer works, too. - WBR, Alexey Proskuryakov ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Fri, Feb 26, 2010 at 10:40 AM, Geoffrey Garen gga...@apple.com wrote: A lot of the proposals on this thread would interfere with this work flow: 1. Finish patch and get it working on local machine. 2. Check in, automatically test for compatibility on other machines and OS's in parallel, resolving unexpected problems as they arise. There is a non-trivial cost of this workflow on the rest of the team. -keeps the commit-queue from running -often results in new test failures going unnoticed because the tree is already red -we can't generally trust that all the tests should pass locally Clearly, every developer having access to every environment and knowing how to setup/build/test on each environment is not an option. Would it be enough for you if you could send a patch to the EWS and get back the results for any test failures? This would let you maintain the above workflow without actually committing. Adam/Eric, how close is the EWS to enabling that? The missing pieces as I see it are: 1. Running the layout tests as part of the EWS. 2. Giving access to the results of any failing tests. and change it to this work flow: 0. Purchase and set up about 15 different build environments. 1. Finish patch and get it working on local machine. 2. Manually test on build environments purchased and set up in (0). 3. Check in. That would be a serious blow to productivity -- probably a cure that is worse than the disease. Bear in mind that the build environments problem is multiplied by Google's choice to use a separate JavaScript engine, which effectively almost doubles the testing surface area. Geoff On Feb 26, 2010, at 9:44 AM, Eric Seidel wrote: The bots take 15 minutes to cycle. The moment the build is broken, thats FIX_TIME + BOT_CYCLE_TIME until their green again. I think we should cap the fix grace period at something like 15 minutes, that means no more than 30 minutes of tree redness per break. That might be too aggressive to start with for WebKit, but I think we should move towards that. I would re-write rule one as something like this: 1. Comment in the bugzilla bug when the build breaks. If there is no bugzilla bug, comment in #webkit. 2. 15 minutes after the break or 10 minutes after the comment, with no reply from the breaker, roll out the patch. -eric On Fri, Feb 26, 2010 at 9:32 AM, Nikolas Zimmermann zimmerm...@physik.rwth-aachen.de wrote: Am 26.02.2010 um 18:17 schrieb Dimitri Glazkov: To summarize the thread: 1) We're adopting when in doubt, roll it out approach to patches that turn tree red. 2) Need to find a way to run Mac-EWS for non-committers. 3) Enable build-break emails to webkit-dev or another opt-in mailing list What else? I'm a bit scared of rule 1. How about we define a minimum delay when to roll-out patches, after they break something? Let's say, if a commit breaks the tree, give the commiter a time frame of 30 minutes to fix it - otherwhise roll-out (we could even automate that.) Example: When landing a SVG patch, that worked fine on Leopard, but broke Snow Leopard, I'd like to have some time to identify wheter it's the fault of my patch, or a platform specific problem. If it's the fault of my patch, I have no problem with reverting. But if I can't immediately fix the problem, because it's a platform specific issue, which can not be fixed (in terms of WebKit), then adding to the Skipped list, and filing a new bug just takes 5 minutes. Reverting the whole patch, just to reland it with a Skipped list addition is a bit too much work for me. What do others think? Cheers, Niko ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
There is a non-trivial cost of this workflow on the rest of the team. -keeps the commit-queue from running -often results in new test failures going unnoticed because the tree is already red -we can't generally trust that all the tests should pass locally I think all of the costs you list fundamentally derive from failures going unnoticed. That's the rationale for my suggestion that we start by making sure that failures are noticed. Would it be enough for you if you could send a patch to the EWS and get back the results for any test failures? It would certainly be very helpful. I don't know if it would be enough to make me think a harsh policy of rolling out patches was a good idea. But if we had a good system for making failures noticed, and a working EWS, and we still had problems with a red tree, I'm sure I would support some further action to solve the problem. Geoff ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Feb 26, 2010, at 11:08 AM, Alexey Proskuryakov wrote: On 26.02.2010, at 10:15, Jeremy Orlow wrote: I didn't even know it existed until now. Was there ever an email sent out on this? If so, I missed it. Buildbot used to announce results there, but it was a few years ago. My recollection is that when it worked, about half of active committers actually joined the channel. I still do, because I'm too lazy to remove it from my auto-connect list :) Buildbot was also listening to commands on this channel, which I think worked as of several months ago. But it also no longer works, too. I believe it announced successes as well as failures, which somewhat limited the utility. I think notice only of failures (or returning to green after previous failures), plus mention of the blameworthy committer's IRC nick, would make a much better notification system. Regards, Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Feb 26, 2010, at 11:34 AM, Geoffrey Garen wrote: There is a non-trivial cost of this workflow on the rest of the team. -keeps the commit-queue from running -often results in new test failures going unnoticed because the tree is already red -we can't generally trust that all the tests should pass locally I think all of the costs you list fundamentally derive from failures going unnoticed. That's the rationale for my suggestion that we start by making sure that failures are noticed. I strongly agree with Geoff that our first step should be to make failures more visible. But if we had a good system for making failures noticed, and a working EWS, and we still had problems with a red tree, I'm sure I would support some further action to solve the problem. I agree with this as well. One goal I have always had for the WebKit project is to have the minimum amount of policy necessary for the project to run smoothly. It seems good to me that we have less in the way of rules and bureaucracy than other open source projects of a similar scale. As the project grows, we will certainly need some additional policy. But I would prefer to take it in steps. It seems to me like making failures more visible would go a long way. Regards, Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Feb 26, 2010, at 11:34 AM, Geoffrey Garen wrote: There is a non-trivial cost of this workflow on the rest of the team. -keeps the commit-queue from running -often results in new test failures going unnoticed because the tree is already red -we can't generally trust that all the tests should pass locally I think all of the costs you list fundamentally derive from failures going unnoticed. That's the rationale for my suggestion that we start by making sure that failures are noticed. Would it be enough for you if you could send a patch to the EWS and get back the results for any test failures? It would certainly be very helpful. I don't know if it would be enough to make me think a harsh policy of rolling out patches was a good idea. But if we had a good system for making failures noticed, and a working EWS, and we still had problems with a red tree, I'm sure I would support some further action to solve the problem. Mozilla has (or at least had when I worked there) two additional tree rules that helped keep the tree green: 1. A sheriff was appointed at all times, and had the authority to close the tree if there was significant build or test breakage. Closing the tree meant that it was blocked to new commits other than those intended to fix problems. Closing the tree also sends a strong message that something is broken, please pitch in and fix it if you can. Sheriff duties were shared around between responsible committers, so as not to overly burden one person. 2. The Mozilla tinderbox page (their buildbot waterfall) had a way for people to leave comments, by adding a star to a particular build with a comment. This is used as a way to communicate that someone has noticed the breakage, and is working on it. In general, I think the waterfall page could be improved in order to make breakage archeology easier. Entries in the Changes column should be direct links to trac changesets, for example. Simon ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Feb 26, 2010, at 11:43 AM, Simon Fraser wrote: Mozilla has (or at least had when I worked there) two additional tree rules that helped keep the tree green: 1. A sheriff was appointed at all times, and had the authority to close the tree if there was significant build or test breakage. Closing the tree meant that it was blocked to new commits other than those intended to fix problems. Closing the tree also sends a strong message that something is broken, please pitch in and fix it if you can. Sheriff duties were shared around between responsible committers, so as not to overly burden one person. I think the build sheriff idea is a good one. Maybe what we want is to have a sheriff responsible for each build train that has an active buildbot. (It could be the same person responsible for several build trains, the main qualification would be having reasonable familiarity with a port and access to its build environment.) However, I am not so sure close the tree is necessarily the best focus for sheriff actions. What I'd prefer to see is that the sheriff the person primarily responsible for reverting broken patches if not fixed in a timely manner. Then we could have some human judgment in the process and specific people with clear responsibility. 2. The Mozilla tinderbox page (their buildbot waterfall) had a way for people to leave comments, by adding a star to a particular build with a comment. This is used as a way to communicate that someone has noticed the breakage, and is working on it. Sounds like a good idea. Wondering if that fits better in the console view or the extensions view. In general, I think the waterfall page could be improved in order to make breakage archeology easier. Entries in the Changes column should be direct links to trac changesets, for example. That sounds good too. Another thing that would help is adding next page links to the console view, like we have on the waterfall. The console link often makes it easier to quickly identify the patch that went bad, but only if the badness is recent enough to show up. Regards, Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Fri, Feb 26, 2010 at 8:53 PM, Maciej Stachowiak m...@apple.com wrote: On Feb 26, 2010, at 11:43 AM, Simon Fraser wrote: Mozilla has (or at least had when I worked there) two additional tree rules that helped keep the tree green: 1. A sheriff was appointed at all times, and had the authority to close the tree if there was significant build or test breakage. Closing the tree meant that it was blocked to new commits other than those intended to fix problems. Closing the tree also sends a strong message that something is broken, please pitch in and fix it if you can. Sheriff duties were shared around between responsible committers, so as not to overly burden one person. I think the build sheriff idea is a good one. Maybe what we want is to have a sheriff responsible for each build train that has an active buildbot. (It could be the same person responsible for several build trains, the main qualification would be having reasonable familiarity with a port and access to its build environment.) However, I am not so sure close the tree is necessarily the best focus for sheriff actions. What I'd prefer to see is that the sheriff the person primarily responsible for reverting broken patches if not fixed in a timely manner. Then we could have some human judgment in the process and specific people with clear responsibility. I agree close to the tree is not necessary for the reasons you listed. And I think most people from the Chromium would welcome this change (sheriff + ability to close). We've been advocating it for some time now. :-) 2. The Mozilla tinderbox page (their buildbot waterfall) had a way for people to leave comments, by adding a star to a particular build with a comment. This is used as a way to communicate that someone has noticed the breakage, and is working on it. Sounds like a good idea. Wondering if that fits better in the console view or the extensions view. In general, I think the waterfall page could be improved in order to make breakage archeology easier. Entries in the Changes column should be direct links to trac changesets, for example. That sounds good too. Another thing that would help is adding next page links to the console view, like we have on the waterfall. The console link often makes it easier to quickly identify the patch that went bad, but only if the badness is recent enough to show up. Regards, Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Feb 26, 2010, at 9:29 AM, Maciej Stachowiak wrote: On Feb 26, 2010, at 9:17 AM, Dimitri Glazkov wrote: To summarize the thread: 1) We're adopting when in doubt, roll it out approach to patches that turn tree red. I think it's polite, though not mandatory, to make a reasonable effort to find the person responsible for the breakage and give them a chance to fix it. (This doesn't have to mean hunting around for hours or days, but you could send email or ask on IRC.) Also acceptable to fix it yourself, if it is obvious how. 2) Need to find a way to run Mac-EWS for non-committers. 3) Enable build-break emails to webkit-dev or another opt-in mailing list What else? I'd like it if we had an IRC bot that announced build breakage on #webkit. The buildbot master lives on hardware that cannot host IRC bots, at least by default. I'd rather the bot be external to the master, but if you really need a bot on that hardware, I can start the request process. -Bill ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Fri, Feb 26, 2010 at 9:00 PM, Jeremy Orlow jor...@chromium.org wrote: On Fri, Feb 26, 2010 at 8:53 PM, Maciej Stachowiak m...@apple.com wrote: On Feb 26, 2010, at 11:43 AM, Simon Fraser wrote: Mozilla has (or at least had when I worked there) two additional tree rules that helped keep the tree green: 1. A sheriff was appointed at all times, and had the authority to close the tree if there was significant build or test breakage. Closing the tree meant that it was blocked to new commits other than those intended to fix problems. Closing the tree also sends a strong message that something is broken, please pitch in and fix it if you can. Sheriff duties were shared around between responsible committers, so as not to overly burden one person. I think the build sheriff idea is a good one. Maybe what we want is to have a sheriff responsible for each build train that has an active buildbot. (It could be the same person responsible for several build trains, the main qualification would be having reasonable familiarity with a port and access to its build environment.) However, I am not so sure close the tree is necessarily the best focus for sheriff actions. What I'd prefer to see is that the sheriff the person primarily responsible for reverting broken patches if not fixed in a timely manner. Then we could have some human judgment in the process and specific people with clear responsibility. I agree close to the tree is not necessary for the reasons you listed. And I think most people from the Chromium would welcome this change (sheriff + ability to close). We've been advocating it for some time now. :-) OopsI completely misread what you said. The reason why being able to close the tree is important is because sometimes it can take a while to sort out what caused what failures. And it's important not to allow more breakage in the mean time. In Chromium, we often have a good deal of redness, but as long as the sheriffs feel as though they're on top of it, the tree stays open. Now, I'll admit that we have many more long running bots (like memory leak bots) and so these kinds of train wrecks that require sorting happen way less in WebKit, but it still might be nice to have the ability when necessary. The suggestion below (2) about notes on the waterfall sounds great, but we do OK by abusing the tree is closed/open string to keep track of other state (like who's working on what fix). We've found this works good enough. And maybe some informal banner like this would be good enough for the first rev, unless we thought per CL annotations would be easy to implement. I'll note that in the Chromium project, we've had a very strong keep the tree green ethic for some time now. And we have a good deal of experience related to it. Certainly there are multiple ways to solve various problems, but it might be worth taking a look at how we do things to see if there are other parts of how we do things that might be of interest. 2. The Mozilla tinderbox page (their buildbot waterfall) had a way for people to leave comments, by adding a star to a particular build with a comment. This is used as a way to communicate that someone has noticed the breakage, and is working on it. Sounds like a good idea. Wondering if that fits better in the console view or the extensions view. In general, I think the waterfall page could be improved in order to make breakage archeology easier. Entries in the Changes column should be direct links to trac changesets, for example. That sounds good too. Another thing that would help is adding next page links to the console view, like we have on the waterfall. The console link often makes it easier to quickly identify the patch that went bad, but only if the badness is recent enough to show up. Regards, Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Fri, Feb 26, 2010 at 11:53 AM, Maciej Stachowiak m...@apple.com wrote: On Feb 26, 2010, at 11:43 AM, Simon Fraser wrote: 2. The Mozilla tinderbox page (their buildbot waterfall) had a way for people to leave comments, by adding a star to a particular build with a comment. This is used as a way to communicate that someone has noticed the breakage, and is working on it. Sounds like a good idea. Wondering if that fits better in the console view or the extensions view. Another, perhaps easier to implement, approach would be to have a single status message that is iframed at the top of the waterfall and console pages. This has proven good enough for chromium. See the message at the top of build.chromium.org. http://chromium-status.appspot.com/current The status can then be updated at http://chromium-status.appspot.com/ (requires login...not sure why), which also shows the last 25 statuses. People use it in ways like 2 win release failures - ojan, mac compile - dglazkov, qt failure - ??? to indicate that ojan/dglazkov are currently actively fixing those and qt has a failure that needs an owner. For the record, I fully support making warnings more visible and improving the EWS/buildbot infrastructure before resorting to adding new policies. On the topic of buildbot infrastructure, one problem I've had is the bots sometimes get quite behind. I made a commit last week that took *hours* before running the tests on the Windows bot. Sitting around for 30 minutes to see the tree green after a commit is one thing, sitting around for 4 hours is another. Hopefully, running tests in parallel will resolve many of these issues. http://chromium-status.appspot.com/current Ojan ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] The tree is on fire: a tragedy of the commons
On Feb 26, 2010, at 12:11 PM, William Siegrist wrote: On Feb 26, 2010, at 9:29 AM, Maciej Stachowiak wrote: On Feb 26, 2010, at 9:17 AM, Dimitri Glazkov wrote: To summarize the thread: 1) We're adopting when in doubt, roll it out approach to patches that turn tree red. I think it's polite, though not mandatory, to make a reasonable effort to find the person responsible for the breakage and give them a chance to fix it. (This doesn't have to mean hunting around for hours or days, but you could send email or ask on IRC.) Also acceptable to fix it yourself, if it is obvious how. 2) Need to find a way to run Mac-EWS for non-committers. 3) Enable build-break emails to webkit-dev or another opt-in mailing list What else? I'd like it if we had an IRC bot that announced build breakage on #webkit. The buildbot master lives on hardware that cannot host IRC bots, at least by default. I'd rather the bot be external to the master, but if you really need a bot on that hardware, I can start the request process. As long as the master can notify whatever host is running the bot, it seems to me like it doesn't matter much if it needs to be the same hardware. I'm not really up on the internal details of buildbot, so I am not sure what would be easier to implement. Regards, Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev