Re: [openstack-dev] [all] fix latency on requirements breakage

2014-11-18 Thread Thierry Carrez
Christopher Yeoh wrote:
 On Tue, Nov 18, 2014 at 9:32 AM, Sean Dague s...@dague.net
 mailto:s...@dague.net wrote:
 
 waiting extra long for valid test results. People don't realize their
 code can't pass and just keep pushing patches up consuming resources
 which means that parts of the project that could pass tests, is backed
 up behind 100% guarunteed failing parts. All in all, not a great system.
 
 
 Maybe a MOTD at the top of http://review.openstack.org could help here?
 Have a button
 that the QA/infra people can hit when everything is broken that puts up
 a message
 there asking people to stop rechecking/submitting patches.

We can already ask statusbot
(http://ci.openstack.org/irc.html#statusbot) to show up messages on
status.openstack.org and log them to
https://wiki.openstack.org/wiki/Infrastructure_Status

It's just not used as much as it used to for CI breakage those days.

-- 
Thierry Carrez (ttx)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] fix latency on requirements breakage

2014-11-18 Thread Thierry Carrez
Sean Dague wrote:
 As we're dealing with the fact that testtools 1.4.0 apparently broke
 something with attribute additions to tests (needed by tempest for
 filtering), it raises an interesting problem.
 
 Our current policy on requirements is to leave them open ended, this
 lets us take upstream fixes. It also breaks us a lot. But our max
 version of dependencies happens with 0 code review or testing.
 
 However, fixing these things takes a bunch of debug, code review, and
 test time. Seen by the fact that the testtools 1.2.0 block didn't even
 manage to fully merge this weekend.
 
 This is an asymetric break/fix path, which I think we need a better plan
 for. If fixing is more expensive than breaking, then you'll tend to be
 in a broken state quite a bit. We really actually want the other
 asymetry if we can get it.

+1

 There are a couple of things we could try here:
 
 == Cap all requirements, require code reviews to bump maximums ==
 
 Benefits, protected from upstream breaks.
 
 Down sides, requires active energy to move forward. The SQLA 0.8
 transition took forever.

I think all projects which do manual dep bumps just end up not keeping
up with the state of the world over time. It's just too much pain to
bump them, introduce risk for so little gain for the party involved (you
just created an externality). You would only bump stuff when you need a
new feature/fix from a new version of the library, which would (1) make
those bumps more costly for the poor soul that will end up needing the
new version and (2) go stale on everything else, not getting free
bugfixes from library authors and generally making the distributions
lives more miserable.

This is totally doable (and desirable imho) on stable branches though,
and we plan to just do that there.

 == Provide Requirements core push authority ==
 
 For blocks on bad versions, if we had a fast path to just merge know
 breaks, we could right ourselves quicker. It would have reasonably
 strict rules, like could only be used to block individual versions.
 Probably that should also come with sending email to the dev list any
 time such a thing happened.
 
 Benefits, fast to fix
 
 Down sides, bypasses our testing infrastructure. Though realistically
 the break bypassed it as well.

That doesn't sound completely crazy. Currently we give the upstream
projects the ability to directly break us, without giving our own
requirements-core the ability to directly fix the breakage.

-- 
Thierry Carrez (ttx)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] fix latency on requirements breakage

2014-11-18 Thread Louis Taylor
On Mon, Nov 17, 2014 at 09:32:21PM -0600, Matt Riedemann wrote:
 I think this idea has come up before, the problem is knowing how to
 distinguish the sky is falling type bugs from other race bugs we know about.
 Thinking out loud it could be severity of the bug in launchpad but we have a
 lot of high/critical race bugs that have been around for a long time and they
 are obviously not breaking the world. We could tag bugs (I'm assuming I could
 get bug tags from the launchpad API) but we'd have to be pretty strict about
 not abusing the tag just to get attention on a bug.
 
 Other ideas?

I think just having something like a 'blocks-gate' tag would be fine. I'm not
sure it could be badly abused. I'm envisioning elastic recheck giving a message
along the lines of:

This failure has been tagged as a gate blocking bug, please avoid
rechecking until it has been addressed.

This should inform people that they shouldn't blindly recheck that patchset and
it has limited scope for abuse.


signature.asc
Description: Digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] fix latency on requirements breakage

2014-11-18 Thread Sean Dague
On 11/18/2014 06:21 AM, Louis Taylor wrote:
 On Mon, Nov 17, 2014 at 09:32:21PM -0600, Matt Riedemann wrote:
 I think this idea has come up before, the problem is knowing how to
 distinguish the sky is falling type bugs from other race bugs we know about.
 Thinking out loud it could be severity of the bug in launchpad but we have a
 lot of high/critical race bugs that have been around for a long time and they
 are obviously not breaking the world. We could tag bugs (I'm assuming I could
 get bug tags from the launchpad API) but we'd have to be pretty strict about
 not abusing the tag just to get attention on a bug.

 Other ideas?
 I think just having something like a 'blocks-gate' tag would be fine. I'm not
 sure it could be badly abused. I'm envisioning elastic recheck giving a 
 message
 along the lines of:

 This failure has been tagged as a gate blocking bug, please avoid
 rechecking until it has been addressed.

 This should inform people that they shouldn't blindly recheck that patchset 
 and
 it has limited scope for abuse.
Realistically it's also saying that folks fixing things have *another*
thing to keep up on. Both the people fixing the bug, and the 1000 people
contributing to OpenStack. So it's more work for everyone, and it
doesn't actually make the fixes come any faster.

-Sean

-- 
Sean Dague
http://dague.net




signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] fix latency on requirements breakage

2014-11-18 Thread Tom Fifield
On 18/11/14 18:51, Thierry Carrez wrote:
 Christopher Yeoh wrote:
 On Tue, Nov 18, 2014 at 9:32 AM, Sean Dague s...@dague.net
 mailto:s...@dague.net wrote:

 waiting extra long for valid test results. People don't realize their
 code can't pass and just keep pushing patches up consuming resources
 which means that parts of the project that could pass tests, is backed
 up behind 100% guarunteed failing parts. All in all, not a great system.


 Maybe a MOTD at the top of http://review.openstack.org could help here?
 Have a button
 that the QA/infra people can hit when everything is broken that puts up
 a message
 there asking people to stop rechecking/submitting patches.
 
 We can already ask statusbot
 (http://ci.openstack.org/irc.html#statusbot) to show up messages on
 status.openstack.org and log them to
 https://wiki.openstack.org/wiki/Infrastructure_Status
 
 It's just not used as much as it used to for CI breakage those days.
 

I have to say, extending statusbot to do MOTD on
http://review.openstack.org sounds like a great idea to me. It also
sounds like one of those changes to gerrit that might actually be in the
'achievable' bucket :D

Regards,

Tom

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [all] fix latency on requirements breakage

2014-11-17 Thread Sean Dague
As we're dealing with the fact that testtools 1.4.0 apparently broke
something with attribute additions to tests (needed by tempest for
filtering), it raises an interesting problem.

Our current policy on requirements is to leave them open ended, this
lets us take upstream fixes. It also breaks us a lot. But our max
version of dependencies happens with 0 code review or testing.

However, fixing these things takes a bunch of debug, code review, and
test time. Seen by the fact that the testtools 1.2.0 block didn't even
manage to fully merge this weekend.

This is an asymetric break/fix path, which I think we need a better plan
for. If fixing is more expensive than breaking, then you'll tend to be
in a broken state quite a bit. We really actually want the other
asymetry if we can get it.

There are a couple of things we could try here:

== Cap all requirements, require code reviews to bump maximums ==

Benefits, protected from upstream breaks.

Down sides, requires active energy to move forward. The SQLA 0.8
transition took forever.

== Provide Requirements core push authority ==

For blocks on bad versions, if we had a fast path to just merge know
breaks, we could right ourselves quicker. It would have reasonably
strict rules, like could only be used to block individual versions.
Probably that should also come with sending email to the dev list any
time such a thing happened.

Benefits, fast to fix

Down sides, bypasses our testing infrastructure. Though realistically
the break bypassed it as well.

...

There are probably other ways to make this more symetric. I had a grand
vision one time of building a system that kind of automated the
requirements bump, but have other problems I think need to be addressed
in OpenStack.


The reason I think it's important to come up with a better way here is
that making our whole code gating system lock up for 12+ hrs because of
an external dependency that we are pretty sure is the crux of our break
becomes very discouraging for developers. They can't get their code
merged. They can't get accurate test results. It means that once we get
the fix done, everyone is rechecking their code, so now everyone is
waiting extra long for valid test results. People don't realize their
code can't pass and just keep pushing patches up consuming resources
which means that parts of the project that could pass tests, is backed
up behind 100% guarunteed failing parts. All in all, not a great system.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] fix latency on requirements breakage

2014-11-17 Thread Robert Collins
Most production systems I know don't run with open ended dependencies.
One of our contributing issues IMO is that we have the requirements
duplicated everywhere - and then ignore them for many of our test runs
(we deliberately override the in-tree ones with global requirements).
Particularly, since the only reason unified requirements matter is for
distro packages, and they ignore our requirements files *anyway*, I'm
not sure our current aggregate system is needed in that light.

That said, making requirements be capped and auto adjust upwards would
be extremely useful IMO, but its a chunk of work;
 - we need the transitive dependencies listed, not just direct dependencies
 - we need a thing to find possible upgrades and propose bumps
 - we would need to very very actively propogate those out from global
requirements

For now I think making 'react to the situation faster and easier' is a
good thing to push on.

-Rob



On 18 November 2014 12:02, Sean Dague s...@dague.net wrote:
 As we're dealing with the fact that testtools 1.4.0 apparently broke
 something with attribute additions to tests (needed by tempest for
 filtering), it raises an interesting problem.

 Our current policy on requirements is to leave them open ended, this
 lets us take upstream fixes. It also breaks us a lot. But our max
 version of dependencies happens with 0 code review or testing.

 However, fixing these things takes a bunch of debug, code review, and
 test time. Seen by the fact that the testtools 1.2.0 block didn't even
 manage to fully merge this weekend.

 This is an asymetric break/fix path, which I think we need a better plan
 for. If fixing is more expensive than breaking, then you'll tend to be
 in a broken state quite a bit. We really actually want the other
 asymetry if we can get it.

 There are a couple of things we could try here:

 == Cap all requirements, require code reviews to bump maximums ==

 Benefits, protected from upstream breaks.

 Down sides, requires active energy to move forward. The SQLA 0.8
 transition took forever.

 == Provide Requirements core push authority ==

 For blocks on bad versions, if we had a fast path to just merge know
 breaks, we could right ourselves quicker. It would have reasonably
 strict rules, like could only be used to block individual versions.
 Probably that should also come with sending email to the dev list any
 time such a thing happened.

 Benefits, fast to fix

 Down sides, bypasses our testing infrastructure. Though realistically
 the break bypassed it as well.

 ...

 There are probably other ways to make this more symetric. I had a grand
 vision one time of building a system that kind of automated the
 requirements bump, but have other problems I think need to be addressed
 in OpenStack.


 The reason I think it's important to come up with a better way here is
 that making our whole code gating system lock up for 12+ hrs because of
 an external dependency that we are pretty sure is the crux of our break
 becomes very discouraging for developers. They can't get their code
 merged. They can't get accurate test results. It means that once we get
 the fix done, everyone is rechecking their code, so now everyone is
 waiting extra long for valid test results. People don't realize their
 code can't pass and just keep pushing patches up consuming resources
 which means that parts of the project that could pass tests, is backed
 up behind 100% guarunteed failing parts. All in all, not a great system.

 -Sean

 --
 Sean Dague
 http://dague.net

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] fix latency on requirements breakage

2014-11-17 Thread Christopher Yeoh
On Tue, Nov 18, 2014 at 9:32 AM, Sean Dague s...@dague.net wrote:

 waiting extra long for valid test results. People don't realize their
 code can't pass and just keep pushing patches up consuming resources
 which means that parts of the project that could pass tests, is backed
 up behind 100% guarunteed failing parts. All in all, not a great system.


Maybe a MOTD at the top of http://review.openstack.org could help here?
Have a button
that the QA/infra people can hit when everything is broken that puts up a
message
there asking people to stop rechecking/submitting patches.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] fix latency on requirements breakage

2014-11-17 Thread Joshua Harlow

Robert Collins wrote:

Most production systems I know don't run with open ended dependencies.
One of our contributing issues IMO is that we have the requirements
duplicated everywhere - and then ignore them for many of our test runs
(we deliberately override the in-tree ones with global requirements).
Particularly, since the only reason unified requirements matter is for
distro packages, and they ignore our requirements files *anyway*, I'm
not sure our current aggregate system is needed in that light.

That said, making requirements be capped and auto adjust upwards would
be extremely useful IMO, but its a chunk of work;
  - we need the transitive dependencies listed, not just direct dependencies


Wouldn't a pip install of the requirements.txt from the requirements 
repo itself get this? That would tell pip to download all the things and 
there transitive dependencies (aka step #1).



  - we need a thing to find possible upgrades and propose bumps


This is an analysis of the $ pip freeze after installing into that 
virtualenv (aka step #2)?



  - we would need to very very actively propogate those out from global
requirements


Sounds like an enhanced updater.py that uses the output from step #2?



For now I think making 'react to the situation faster and easier' is a
good thing to push on.


One question I have is that not all things specify all there 
dependencies, since some of them are pluggable (for example kombu can 
use couchdb, or a transport exists that seems like it could, yet kombu 
doesn't list that dependency in its requirements (it gets listed in 
https://github.com/celery/kombu/blob/master/setup.py#L122 under 
'extra_requires' though); I'm sure other pluggable libraries 
(sqlalchemy, taskflow, tooz...) are similar in this regard so I wonder 
how those kind of libraries would work with this kind of proposal.




-Rob



On 18 November 2014 12:02, Sean Dagues...@dague.net  wrote:

As we're dealing with the fact that testtools 1.4.0 apparently broke
something with attribute additions to tests (needed by tempest for
filtering), it raises an interesting problem.

Our current policy on requirements is to leave them open ended, this
lets us take upstream fixes. It also breaks us a lot. But our max
version of dependencies happens with 0 code review or testing.

However, fixing these things takes a bunch of debug, code review, and
test time. Seen by the fact that the testtools 1.2.0 block didn't even
manage to fully merge this weekend.

This is an asymetric break/fix path, which I think we need a better plan
for. If fixing is more expensive than breaking, then you'll tend to be
in a broken state quite a bit. We really actually want the other
asymetry if we can get it.

There are a couple of things we could try here:

== Cap all requirements, require code reviews to bump maximums ==

Benefits, protected from upstream breaks.

Down sides, requires active energy to move forward. The SQLA 0.8
transition took forever.

== Provide Requirements core push authority ==

For blocks on bad versions, if we had a fast path to just merge know
breaks, we could right ourselves quicker. It would have reasonably
strict rules, like could only be used to block individual versions.
Probably that should also come with sending email to the dev list any
time such a thing happened.

Benefits, fast to fix

Down sides, bypasses our testing infrastructure. Though realistically
the break bypassed it as well.

...

There are probably other ways to make this more symetric. I had a grand
vision one time of building a system that kind of automated the
requirements bump, but have other problems I think need to be addressed
in OpenStack.


The reason I think it's important to come up with a better way here is
that making our whole code gating system lock up for 12+ hrs because of
an external dependency that we are pretty sure is the crux of our break
becomes very discouraging for developers. They can't get their code
merged. They can't get accurate test results. It means that once we get
the fix done, everyone is rechecking their code, so now everyone is
waiting extra long for valid test results. People don't realize their
code can't pass and just keep pushing patches up consuming resources
which means that parts of the project that could pass tests, is backed
up behind 100% guarunteed failing parts. All in all, not a great system.

 -Sean

--
Sean Dague
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev






___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] fix latency on requirements breakage

2014-11-17 Thread Louis Taylor
On Tue, Nov 18, 2014 at 10:46:38AM +1030, Christopher Yeoh wrote:
 Maybe a MOTD at the top of http://review.openstack.org could help here?  Have
 a button that the QA/infra people can hit when everything is broken that puts
 up a message there asking people to stop rechecking/submitting patches.

How about elastic recheck showing a message? If a bug is identified as breaking
the world, it shouldn't give a helpful feel free to leave a 'recheck' comment
to run the tests again comment when tests fail. That just encourages people to
keep rechecking.


signature.asc
Description: Digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] fix latency on requirements breakage

2014-11-17 Thread Jeremy Stanley
On 2014-11-17 16:41:02 -0800 (-0800), Joshua Harlow wrote:
 Robert Collins wrote:
 [...]
 That said, making requirements be capped and auto adjust upwards would
 be extremely useful IMO, but its a chunk of work;
   - we need the transitive dependencies listed, not just direct dependencies
 
 Wouldn't a pip install of the requirements.txt from the requirements repo
 itself get this? That would tell pip to download all the things and there
 transitive dependencies (aka step #1).
 
   - we need a thing to find possible upgrades and propose bumps
 
 This is an analysis of the $ pip freeze after installing into that
 virtualenv (aka step #2)?
[...]

Something to keep in mind here is that just asking pip to install a
list of 150 packages at particular versions doesn't actually get you
that. You can't ever really cap your transitive dependencies
effectively because they are transitive, so pip will ignore what
you've asked for if some other package you subsequently install
wants a different version of the same. For this reason, the result
is also highly dependent on the order in which you list these
dependencies.

If your project lists dependencies on AX BY and then project B
which you don't control lists a dependency on AX, you'll get AX
BY as the end result.

Probably the closest we can come is to try to iteratively identify a
set of specific versions which when requested are the actual
versions that end up being installed, and then test and report on
the effects of deviating any one of those versions upward to the
next available version. I posit this will at times lead to
multi-point attractors rather than static solutions, with an
increasing likelihood as the list of dependencies grows.
-- 
Jeremy Stanley

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] fix latency on requirements breakage

2014-11-17 Thread Mathieu Gagné

Sean Dague, thanks for bringing up the subject.

This is highly relevant to my interests. =)

On 2014-11-17 7:10 PM, Robert Collins wrote:

Most production systems I know don't run with open ended dependencies.
One of our contributing issues IMO is that we have the requirements
duplicated everywhere - and then ignore them for many of our test runs
(we deliberately override the in-tree ones with global requirements).
Particularly, since the only reason unified requirements matter is for
distro packages, and they ignore our requirements files *anyway*, I'm
not sure our current aggregate system is needed in that light.

That said, making requirements be capped and auto adjust upwards would
be extremely useful IMO, but its a chunk of work;
  - we need the transitive dependencies listed, not just direct dependencies
  - we need a thing to find possible upgrades and propose bumps


I recently found this blog post which suggests using pip-review:
http://nvie.com/posts/pin-your-packages/#pip-review

Could it be run once in a while against global requirements and a change 
proposed to gerrit to review new updates?



  - we would need to very very actively propogate those out from global
requirements

For now I think making 'react to the situation faster and easier' is a
good thing to push on.

-Rob



--
Mathieu

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] fix latency on requirements breakage

2014-11-17 Thread Joshua Harlow

Good point, we really need a better dependency resolver/installer...

Jeremy Stanley wrote:

On 2014-11-17 16:41:02 -0800 (-0800), Joshua Harlow wrote:

Robert Collins wrote:
[...]

That said, making requirements be capped and auto adjust upwards would
be extremely useful IMO, but its a chunk of work;
  - we need the transitive dependencies listed, not just direct dependencies

Wouldn't a pip install of the requirements.txt from the requirements repo
itself get this? That would tell pip to download all the things and there
transitive dependencies (aka step #1).


  - we need a thing to find possible upgrades and propose bumps

This is an analysis of the $ pip freeze after installing into that
virtualenv (aka step #2)?

[...]

Something to keep in mind here is that just asking pip to install a
list of 150 packages at particular versions doesn't actually get you
that. You can't ever really cap your transitive dependencies
effectively because they are transitive, so pip will ignore what
you've asked for if some other package you subsequently install
wants a different version of the same. For this reason, the result
is also highly dependent on the order in which you list these
dependencies.

If your project lists dependencies on AX BY and then project B
which you don't control lists a dependency on AX, you'll get AX
BY as the end result.

Probably the closest we can come is to try to iteratively identify a
set of specific versions which when requested are the actual
versions that end up being installed, and then test and report on
the effects of deviating any one of those versions upward to the
next available version. I posit this will at times lead to
multi-point attractors rather than static solutions, with an
increasing likelihood as the list of dependencies grows.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] fix latency on requirements breakage

2014-11-17 Thread Matt Riedemann



On 11/17/2014 6:57 PM, Louis Taylor wrote:

On Tue, Nov 18, 2014 at 10:46:38AM +1030, Christopher Yeoh wrote:

Maybe a MOTD at the top of http://review.openstack.org could help here?  Have
a button that the QA/infra people can hit when everything is broken that puts
up a message there asking people to stop rechecking/submitting patches.


How about elastic recheck showing a message? If a bug is identified as breaking
the world, it shouldn't give a helpful feel free to leave a 'recheck' comment
to run the tests again comment when tests fail. That just encourages people to
keep rechecking.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



I think this idea has come up before, the problem is knowing how to 
distinguish the sky is falling type bugs from other race bugs we know 
about. Thinking out loud it could be severity of the bug in launchpad 
but we have a lot of high/critical race bugs that have been around for a 
long time and they are obviously not breaking the world. We could tag 
bugs (I'm assuming I could get bug tags from the launchpad API) but we'd 
have to be pretty strict about not abusing the tag just to get attention 
on a bug.


Other ideas?

--

Thanks,

Matt Riedemann


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev