On 07/18/2018 10:14 AM, Matt Riedemann wrote:
As can be seen from logstash [1] this bug is hurting us pretty bad in the check
queue.

I thought I originally had this fixed with [2] but that turned out to only be
part of the issue.

I think I've identified the problem but I have failed to write a recreate
regression test [3] because (I think) it's due to random ordering of which
request spec we select to send to the scheduler during a multi-create request
(and I tried making that predictable by sorting the instances by uuid in both
conductor and the scheduler but that didn't make a difference in my test).

Can we get rid of multi-create? It keeps causing complications, and it already has weird behaviour if you ask for min_count=X and max_count=Y and only X instances can be scheduled. (Currently it fails with NoValidHost, but it should arguably start up X instances.)

After talking with Sean Mooney, we have another fix which is self-contained to
the scheduler [5] so we wouldn't need to make any changes to the RequestSpec
handling in conductor. It's admittedly a bit hairy, so I'm asking for some eyes
on it since either way we go, we should get going soon before we hit the FF and
RC1 rush which *always* kills the gate.

One of your options mentioned using RequestSpec.num_instances to decide if it's in a multi-create. Is there any reason to persist RequestSpec.num_instances? It seems like it's only applicable to the initial request, since after that each instance is managed individually.

Chris

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to