[openstack-dev] [nova] Bug 1781710 killing the check queue

Matt Riedemann Wed, 18 Jul 2018 09:16:18 -0700

As can be seen from logstash [1] this bug is hurting us pretty bad inthe check queue.

I thought I originally had this fixed with [2] but that turned out toonly be part of the issue.

I think I've identified the problem but I have failed to write arecreate regression test [3] because (I think) it's due to randomordering of which request spec we select to send to the scheduler duringa multi-create request (and I tried making that predictable by sortingthe instances by uuid in both conductor and the scheduler but thatdidn't make a difference in my test).

I started with one fix yesterday [4] but that would regress an earlierfix for resizing servers to the same host which are in an anti-affinitygroup. If we went that route, it will involve changes to how we handleRequestSpec.num_instances (either not persist it, or reset it duringmove operations).

After talking with Sean Mooney, we have another fix which isself-contained to the scheduler [5] so we wouldn't need to make anychanges to the RequestSpec handling in conductor. It's admittedly a bithairy, so I'm asking for some eyes on it since either way we go, weshould get going soon before we hit the FF and RC1 rush which *always*kills the gate.


[1] http://status.openstack.org/elastic-recheck/index.html#1781710
[2] https://review.openstack.org/#/c/582976/
[3] https://review.openstack.org/#/c/583339
[4] https://review.openstack.org/#/c/583351
[5] https://review.openstack.org/#/c/583347

--

Thanks,

Matt

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova] Bug 1781710 killing the check queue

Reply via email to