So, it's pretty hard to explain in one small comment how the model is
behaving, but please consider that we have 'sort of' two-phase commits
when booting an instance.

When a request comes in, you're right, the instances are elected
iteratively by decrementing the resource usage of the elected node in
HostState.consume_from_instance(). That means that when you're asking
for 10 instances of the same type, the corresponding HostState(s) will
be decremented before the next filters call which should provide a good
way for ensuring consistency. That's only when the 10 instances are
elected that the scheduler gives the answer back to the *conductor*
which calls the respective compute managers (ie. your step #3 is
incorrect since Juno).

Now, that HostState model is something kept in-memory and only refreshed
when a new request comes in. That means that if you have two schedulers
running separately (or when you have 2 concurrent requests coming in),
then yes you could have race conditions.

That's not really a problem in general, because if your could is enough
sized, it will go to the compute manager which will use a context
manager called "instance_claim()" for ensuring that its *OWN* internal
representation is correct (and that method is thread-safe in the
context). If the scheduler decision was incorrect, then it raises an
error which is catched by the compute manager which calls again the
conductor to ask for a reschedule (by excluding the wrong host).

So, see, when we have races, we have retries (that's the 2PC I
mentioned). That's not perfect, in particular when the cloud is a bit
full, and that's why we're working towards resolving that thru multiple
possibilities :

https://review.openstack.org/#/c/192260/7/doc/source/scheduler_evolution.rst,cm

To be honest, I don't see clear actionable items in your bug request.
I'd rather propose you to join the scheduler meetings happening every
Mondays at 1400 UTC if you wish to help us and contribute.


** Changed in: nova
       Status: New => Opinion

** Changed in: nova
   Importance: Undecided => Wishlist

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1488986

Title:
  nova scheduler for race condition

Status in OpenStack Compute (nova):
  Opinion

Bug description:
  a) nova compute service updates info of compute-node by run 
update_available_resource every CONF.update_resources_interval(60s by default). 
  b) for every scheduler request:
  1. select_destinations is called and get all HostStates(if compute-node is 
newer that local hoststate info based on updated_at, update the HostStates with 
the compute info from DB)
  2. check if the host resource can meet instance requirement one by one with 
updating the HostState resource iteratively, if yes, send 
build_and_run_instance cast RPC to the corresponding compute node.
  3.compute service accept the amqp message and consumed the instance 
requirement and write new compute info into DB.
  4.compute try to spawn the instance, once failed, roll back step 3.

  My question:
  if user set CONF.update_resources_interval 1s, that is, compute node service 
updates compute info into DB every 1s. 
  For the case: the user sends multi nova boot request,  and the first boot 
request goes to step 2 and the compute node service runs periodic task 
update_available_resource at the same time. And the second boot request go to 
step 1 and the first request still not goes to step3, so the second boot 
request gets HostStates set without the first instance's consumption and 
scheduler service will schedule a host for it without considering the first 
instance consumption. And the following request repeats.

  Can this race condition occur?

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1488986/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to