Re: [openstack-dev] [all] fix latency on requirements breakage
Christopher Yeoh wrote: On Tue, Nov 18, 2014 at 9:32 AM, Sean Dague s...@dague.net mailto:s...@dague.net wrote: waiting extra long for valid test results. People don't realize their code can't pass and just keep pushing patches up consuming resources which means that parts of the project that could pass tests, is backed up behind 100% guarunteed failing parts. All in all, not a great system. Maybe a MOTD at the top of http://review.openstack.org could help here? Have a button that the QA/infra people can hit when everything is broken that puts up a message there asking people to stop rechecking/submitting patches. We can already ask statusbot (http://ci.openstack.org/irc.html#statusbot) to show up messages on status.openstack.org and log them to https://wiki.openstack.org/wiki/Infrastructure_Status It's just not used as much as it used to for CI breakage those days. -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] fix latency on requirements breakage
Sean Dague wrote: As we're dealing with the fact that testtools 1.4.0 apparently broke something with attribute additions to tests (needed by tempest for filtering), it raises an interesting problem. Our current policy on requirements is to leave them open ended, this lets us take upstream fixes. It also breaks us a lot. But our max version of dependencies happens with 0 code review or testing. However, fixing these things takes a bunch of debug, code review, and test time. Seen by the fact that the testtools 1.2.0 block didn't even manage to fully merge this weekend. This is an asymetric break/fix path, which I think we need a better plan for. If fixing is more expensive than breaking, then you'll tend to be in a broken state quite a bit. We really actually want the other asymetry if we can get it. +1 There are a couple of things we could try here: == Cap all requirements, require code reviews to bump maximums == Benefits, protected from upstream breaks. Down sides, requires active energy to move forward. The SQLA 0.8 transition took forever. I think all projects which do manual dep bumps just end up not keeping up with the state of the world over time. It's just too much pain to bump them, introduce risk for so little gain for the party involved (you just created an externality). You would only bump stuff when you need a new feature/fix from a new version of the library, which would (1) make those bumps more costly for the poor soul that will end up needing the new version and (2) go stale on everything else, not getting free bugfixes from library authors and generally making the distributions lives more miserable. This is totally doable (and desirable imho) on stable branches though, and we plan to just do that there. == Provide Requirements core push authority == For blocks on bad versions, if we had a fast path to just merge know breaks, we could right ourselves quicker. It would have reasonably strict rules, like could only be used to block individual versions. Probably that should also come with sending email to the dev list any time such a thing happened. Benefits, fast to fix Down sides, bypasses our testing infrastructure. Though realistically the break bypassed it as well. That doesn't sound completely crazy. Currently we give the upstream projects the ability to directly break us, without giving our own requirements-core the ability to directly fix the breakage. -- Thierry Carrez (ttx) ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] fix latency on requirements breakage
On Mon, Nov 17, 2014 at 09:32:21PM -0600, Matt Riedemann wrote: I think this idea has come up before, the problem is knowing how to distinguish the sky is falling type bugs from other race bugs we know about. Thinking out loud it could be severity of the bug in launchpad but we have a lot of high/critical race bugs that have been around for a long time and they are obviously not breaking the world. We could tag bugs (I'm assuming I could get bug tags from the launchpad API) but we'd have to be pretty strict about not abusing the tag just to get attention on a bug. Other ideas? I think just having something like a 'blocks-gate' tag would be fine. I'm not sure it could be badly abused. I'm envisioning elastic recheck giving a message along the lines of: This failure has been tagged as a gate blocking bug, please avoid rechecking until it has been addressed. This should inform people that they shouldn't blindly recheck that patchset and it has limited scope for abuse. signature.asc Description: Digital signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] fix latency on requirements breakage
On 11/18/2014 06:21 AM, Louis Taylor wrote: On Mon, Nov 17, 2014 at 09:32:21PM -0600, Matt Riedemann wrote: I think this idea has come up before, the problem is knowing how to distinguish the sky is falling type bugs from other race bugs we know about. Thinking out loud it could be severity of the bug in launchpad but we have a lot of high/critical race bugs that have been around for a long time and they are obviously not breaking the world. We could tag bugs (I'm assuming I could get bug tags from the launchpad API) but we'd have to be pretty strict about not abusing the tag just to get attention on a bug. Other ideas? I think just having something like a 'blocks-gate' tag would be fine. I'm not sure it could be badly abused. I'm envisioning elastic recheck giving a message along the lines of: This failure has been tagged as a gate blocking bug, please avoid rechecking until it has been addressed. This should inform people that they shouldn't blindly recheck that patchset and it has limited scope for abuse. Realistically it's also saying that folks fixing things have *another* thing to keep up on. Both the people fixing the bug, and the 1000 people contributing to OpenStack. So it's more work for everyone, and it doesn't actually make the fixes come any faster. -Sean -- Sean Dague http://dague.net signature.asc Description: OpenPGP digital signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] fix latency on requirements breakage
On 18/11/14 18:51, Thierry Carrez wrote: Christopher Yeoh wrote: On Tue, Nov 18, 2014 at 9:32 AM, Sean Dague s...@dague.net mailto:s...@dague.net wrote: waiting extra long for valid test results. People don't realize their code can't pass and just keep pushing patches up consuming resources which means that parts of the project that could pass tests, is backed up behind 100% guarunteed failing parts. All in all, not a great system. Maybe a MOTD at the top of http://review.openstack.org could help here? Have a button that the QA/infra people can hit when everything is broken that puts up a message there asking people to stop rechecking/submitting patches. We can already ask statusbot (http://ci.openstack.org/irc.html#statusbot) to show up messages on status.openstack.org and log them to https://wiki.openstack.org/wiki/Infrastructure_Status It's just not used as much as it used to for CI breakage those days. I have to say, extending statusbot to do MOTD on http://review.openstack.org sounds like a great idea to me. It also sounds like one of those changes to gerrit that might actually be in the 'achievable' bucket :D Regards, Tom ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [all] fix latency on requirements breakage
As we're dealing with the fact that testtools 1.4.0 apparently broke something with attribute additions to tests (needed by tempest for filtering), it raises an interesting problem. Our current policy on requirements is to leave them open ended, this lets us take upstream fixes. It also breaks us a lot. But our max version of dependencies happens with 0 code review or testing. However, fixing these things takes a bunch of debug, code review, and test time. Seen by the fact that the testtools 1.2.0 block didn't even manage to fully merge this weekend. This is an asymetric break/fix path, which I think we need a better plan for. If fixing is more expensive than breaking, then you'll tend to be in a broken state quite a bit. We really actually want the other asymetry if we can get it. There are a couple of things we could try here: == Cap all requirements, require code reviews to bump maximums == Benefits, protected from upstream breaks. Down sides, requires active energy to move forward. The SQLA 0.8 transition took forever. == Provide Requirements core push authority == For blocks on bad versions, if we had a fast path to just merge know breaks, we could right ourselves quicker. It would have reasonably strict rules, like could only be used to block individual versions. Probably that should also come with sending email to the dev list any time such a thing happened. Benefits, fast to fix Down sides, bypasses our testing infrastructure. Though realistically the break bypassed it as well. ... There are probably other ways to make this more symetric. I had a grand vision one time of building a system that kind of automated the requirements bump, but have other problems I think need to be addressed in OpenStack. The reason I think it's important to come up with a better way here is that making our whole code gating system lock up for 12+ hrs because of an external dependency that we are pretty sure is the crux of our break becomes very discouraging for developers. They can't get their code merged. They can't get accurate test results. It means that once we get the fix done, everyone is rechecking their code, so now everyone is waiting extra long for valid test results. People don't realize their code can't pass and just keep pushing patches up consuming resources which means that parts of the project that could pass tests, is backed up behind 100% guarunteed failing parts. All in all, not a great system. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] fix latency on requirements breakage
Most production systems I know don't run with open ended dependencies. One of our contributing issues IMO is that we have the requirements duplicated everywhere - and then ignore them for many of our test runs (we deliberately override the in-tree ones with global requirements). Particularly, since the only reason unified requirements matter is for distro packages, and they ignore our requirements files *anyway*, I'm not sure our current aggregate system is needed in that light. That said, making requirements be capped and auto adjust upwards would be extremely useful IMO, but its a chunk of work; - we need the transitive dependencies listed, not just direct dependencies - we need a thing to find possible upgrades and propose bumps - we would need to very very actively propogate those out from global requirements For now I think making 'react to the situation faster and easier' is a good thing to push on. -Rob On 18 November 2014 12:02, Sean Dague s...@dague.net wrote: As we're dealing with the fact that testtools 1.4.0 apparently broke something with attribute additions to tests (needed by tempest for filtering), it raises an interesting problem. Our current policy on requirements is to leave them open ended, this lets us take upstream fixes. It also breaks us a lot. But our max version of dependencies happens with 0 code review or testing. However, fixing these things takes a bunch of debug, code review, and test time. Seen by the fact that the testtools 1.2.0 block didn't even manage to fully merge this weekend. This is an asymetric break/fix path, which I think we need a better plan for. If fixing is more expensive than breaking, then you'll tend to be in a broken state quite a bit. We really actually want the other asymetry if we can get it. There are a couple of things we could try here: == Cap all requirements, require code reviews to bump maximums == Benefits, protected from upstream breaks. Down sides, requires active energy to move forward. The SQLA 0.8 transition took forever. == Provide Requirements core push authority == For blocks on bad versions, if we had a fast path to just merge know breaks, we could right ourselves quicker. It would have reasonably strict rules, like could only be used to block individual versions. Probably that should also come with sending email to the dev list any time such a thing happened. Benefits, fast to fix Down sides, bypasses our testing infrastructure. Though realistically the break bypassed it as well. ... There are probably other ways to make this more symetric. I had a grand vision one time of building a system that kind of automated the requirements bump, but have other problems I think need to be addressed in OpenStack. The reason I think it's important to come up with a better way here is that making our whole code gating system lock up for 12+ hrs because of an external dependency that we are pretty sure is the crux of our break becomes very discouraging for developers. They can't get their code merged. They can't get accurate test results. It means that once we get the fix done, everyone is rechecking their code, so now everyone is waiting extra long for valid test results. People don't realize their code can't pass and just keep pushing patches up consuming resources which means that parts of the project that could pass tests, is backed up behind 100% guarunteed failing parts. All in all, not a great system. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] fix latency on requirements breakage
On Tue, Nov 18, 2014 at 9:32 AM, Sean Dague s...@dague.net wrote: waiting extra long for valid test results. People don't realize their code can't pass and just keep pushing patches up consuming resources which means that parts of the project that could pass tests, is backed up behind 100% guarunteed failing parts. All in all, not a great system. Maybe a MOTD at the top of http://review.openstack.org could help here? Have a button that the QA/infra people can hit when everything is broken that puts up a message there asking people to stop rechecking/submitting patches. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] fix latency on requirements breakage
Robert Collins wrote: Most production systems I know don't run with open ended dependencies. One of our contributing issues IMO is that we have the requirements duplicated everywhere - and then ignore them for many of our test runs (we deliberately override the in-tree ones with global requirements). Particularly, since the only reason unified requirements matter is for distro packages, and they ignore our requirements files *anyway*, I'm not sure our current aggregate system is needed in that light. That said, making requirements be capped and auto adjust upwards would be extremely useful IMO, but its a chunk of work; - we need the transitive dependencies listed, not just direct dependencies Wouldn't a pip install of the requirements.txt from the requirements repo itself get this? That would tell pip to download all the things and there transitive dependencies (aka step #1). - we need a thing to find possible upgrades and propose bumps This is an analysis of the $ pip freeze after installing into that virtualenv (aka step #2)? - we would need to very very actively propogate those out from global requirements Sounds like an enhanced updater.py that uses the output from step #2? For now I think making 'react to the situation faster and easier' is a good thing to push on. One question I have is that not all things specify all there dependencies, since some of them are pluggable (for example kombu can use couchdb, or a transport exists that seems like it could, yet kombu doesn't list that dependency in its requirements (it gets listed in https://github.com/celery/kombu/blob/master/setup.py#L122 under 'extra_requires' though); I'm sure other pluggable libraries (sqlalchemy, taskflow, tooz...) are similar in this regard so I wonder how those kind of libraries would work with this kind of proposal. -Rob On 18 November 2014 12:02, Sean Dagues...@dague.net wrote: As we're dealing with the fact that testtools 1.4.0 apparently broke something with attribute additions to tests (needed by tempest for filtering), it raises an interesting problem. Our current policy on requirements is to leave them open ended, this lets us take upstream fixes. It also breaks us a lot. But our max version of dependencies happens with 0 code review or testing. However, fixing these things takes a bunch of debug, code review, and test time. Seen by the fact that the testtools 1.2.0 block didn't even manage to fully merge this weekend. This is an asymetric break/fix path, which I think we need a better plan for. If fixing is more expensive than breaking, then you'll tend to be in a broken state quite a bit. We really actually want the other asymetry if we can get it. There are a couple of things we could try here: == Cap all requirements, require code reviews to bump maximums == Benefits, protected from upstream breaks. Down sides, requires active energy to move forward. The SQLA 0.8 transition took forever. == Provide Requirements core push authority == For blocks on bad versions, if we had a fast path to just merge know breaks, we could right ourselves quicker. It would have reasonably strict rules, like could only be used to block individual versions. Probably that should also come with sending email to the dev list any time such a thing happened. Benefits, fast to fix Down sides, bypasses our testing infrastructure. Though realistically the break bypassed it as well. ... There are probably other ways to make this more symetric. I had a grand vision one time of building a system that kind of automated the requirements bump, but have other problems I think need to be addressed in OpenStack. The reason I think it's important to come up with a better way here is that making our whole code gating system lock up for 12+ hrs because of an external dependency that we are pretty sure is the crux of our break becomes very discouraging for developers. They can't get their code merged. They can't get accurate test results. It means that once we get the fix done, everyone is rechecking their code, so now everyone is waiting extra long for valid test results. People don't realize their code can't pass and just keep pushing patches up consuming resources which means that parts of the project that could pass tests, is backed up behind 100% guarunteed failing parts. All in all, not a great system. -Sean -- Sean Dague http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] fix latency on requirements breakage
On Tue, Nov 18, 2014 at 10:46:38AM +1030, Christopher Yeoh wrote: Maybe a MOTD at the top of http://review.openstack.org could help here? Have a button that the QA/infra people can hit when everything is broken that puts up a message there asking people to stop rechecking/submitting patches. How about elastic recheck showing a message? If a bug is identified as breaking the world, it shouldn't give a helpful feel free to leave a 'recheck' comment to run the tests again comment when tests fail. That just encourages people to keep rechecking. signature.asc Description: Digital signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] fix latency on requirements breakage
On 2014-11-17 16:41:02 -0800 (-0800), Joshua Harlow wrote: Robert Collins wrote: [...] That said, making requirements be capped and auto adjust upwards would be extremely useful IMO, but its a chunk of work; - we need the transitive dependencies listed, not just direct dependencies Wouldn't a pip install of the requirements.txt from the requirements repo itself get this? That would tell pip to download all the things and there transitive dependencies (aka step #1). - we need a thing to find possible upgrades and propose bumps This is an analysis of the $ pip freeze after installing into that virtualenv (aka step #2)? [...] Something to keep in mind here is that just asking pip to install a list of 150 packages at particular versions doesn't actually get you that. You can't ever really cap your transitive dependencies effectively because they are transitive, so pip will ignore what you've asked for if some other package you subsequently install wants a different version of the same. For this reason, the result is also highly dependent on the order in which you list these dependencies. If your project lists dependencies on AX BY and then project B which you don't control lists a dependency on AX, you'll get AX BY as the end result. Probably the closest we can come is to try to iteratively identify a set of specific versions which when requested are the actual versions that end up being installed, and then test and report on the effects of deviating any one of those versions upward to the next available version. I posit this will at times lead to multi-point attractors rather than static solutions, with an increasing likelihood as the list of dependencies grows. -- Jeremy Stanley ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] fix latency on requirements breakage
Sean Dague, thanks for bringing up the subject. This is highly relevant to my interests. =) On 2014-11-17 7:10 PM, Robert Collins wrote: Most production systems I know don't run with open ended dependencies. One of our contributing issues IMO is that we have the requirements duplicated everywhere - and then ignore them for many of our test runs (we deliberately override the in-tree ones with global requirements). Particularly, since the only reason unified requirements matter is for distro packages, and they ignore our requirements files *anyway*, I'm not sure our current aggregate system is needed in that light. That said, making requirements be capped and auto adjust upwards would be extremely useful IMO, but its a chunk of work; - we need the transitive dependencies listed, not just direct dependencies - we need a thing to find possible upgrades and propose bumps I recently found this blog post which suggests using pip-review: http://nvie.com/posts/pin-your-packages/#pip-review Could it be run once in a while against global requirements and a change proposed to gerrit to review new updates? - we would need to very very actively propogate those out from global requirements For now I think making 'react to the situation faster and easier' is a good thing to push on. -Rob -- Mathieu ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] fix latency on requirements breakage
Good point, we really need a better dependency resolver/installer... Jeremy Stanley wrote: On 2014-11-17 16:41:02 -0800 (-0800), Joshua Harlow wrote: Robert Collins wrote: [...] That said, making requirements be capped and auto adjust upwards would be extremely useful IMO, but its a chunk of work; - we need the transitive dependencies listed, not just direct dependencies Wouldn't a pip install of the requirements.txt from the requirements repo itself get this? That would tell pip to download all the things and there transitive dependencies (aka step #1). - we need a thing to find possible upgrades and propose bumps This is an analysis of the $ pip freeze after installing into that virtualenv (aka step #2)? [...] Something to keep in mind here is that just asking pip to install a list of 150 packages at particular versions doesn't actually get you that. You can't ever really cap your transitive dependencies effectively because they are transitive, so pip will ignore what you've asked for if some other package you subsequently install wants a different version of the same. For this reason, the result is also highly dependent on the order in which you list these dependencies. If your project lists dependencies on AX BY and then project B which you don't control lists a dependency on AX, you'll get AX BY as the end result. Probably the closest we can come is to try to iteratively identify a set of specific versions which when requested are the actual versions that end up being installed, and then test and report on the effects of deviating any one of those versions upward to the next available version. I posit this will at times lead to multi-point attractors rather than static solutions, with an increasing likelihood as the list of dependencies grows. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [all] fix latency on requirements breakage
On 11/17/2014 6:57 PM, Louis Taylor wrote: On Tue, Nov 18, 2014 at 10:46:38AM +1030, Christopher Yeoh wrote: Maybe a MOTD at the top of http://review.openstack.org could help here? Have a button that the QA/infra people can hit when everything is broken that puts up a message there asking people to stop rechecking/submitting patches. How about elastic recheck showing a message? If a bug is identified as breaking the world, it shouldn't give a helpful feel free to leave a 'recheck' comment to run the tests again comment when tests fail. That just encourages people to keep rechecking. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev I think this idea has come up before, the problem is knowing how to distinguish the sky is falling type bugs from other race bugs we know about. Thinking out loud it could be severity of the bug in launchpad but we have a lot of high/critical race bugs that have been around for a long time and they are obviously not breaking the world. We could tag bugs (I'm assuming I could get bug tags from the launchpad API) but we'd have to be pretty strict about not abusing the tag just to get attention on a bug. Other ideas? -- Thanks, Matt Riedemann ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev