Public bug reported:

Looking at these scheduler logs during the initial select_destinations
flow for a server create:

http://logs.openstack.org/89/527289/1/check/ironic-tempest-dsvm-ipa-
wholedisk-agent_ipmitool-tinyipa-
multinode/406d4ab/logs/screen-n-sch.txt.gz#_Dec_12_22_23_47_783670

There are 12 candidate hosts and the RetryFilter logs that re-scheduling
is disabled on all 12 of them:

Dec 12 22:23:47.783670 ubuntu-xenial-ovh-bhs1-0001397778 nova-scheduler[19797]: 
DEBUG nova.filters [None req-bd0816cf-243e-4ceb-90cb-053b15dbcbc7 
tempest-TestServerBasicOps-1888713420 tempest-TestServerBasicOps-1888713420] 
Starting with 12 host(s) {{(pid=19797) get_filtered_objects 
/opt/stack/new/nova/nova/filters.py:70}}
Dec 12 22:23:47.783964 ubuntu-xenial-ovh-bhs1-0001397778 nova-scheduler[19797]: 
DEBUG nova.scheduler.filters.retry_filter [None 
req-bd0816cf-243e-4ceb-90cb-053b15dbcbc7 tempest-TestServerBasicOps-1888713420 
tempest-TestServerBasicOps-1888713420] Re-scheduling is disabled {{(pid=19797) 
host_passes /opt/stack/new/nova/nova/scheduler/filters/retry_filter.py:38}}
...

This is confusing because CONF.scheduler.max_attempts is the default
value of 3:

Dec 12 22:01:58.784033 ubuntu-xenial-ovh-bhs1-0001397778 nova-
scheduler[19797]: DEBUG oslo_service.service [None req-f45f1271-8d1d-
4bb7-8bea-4d3a39be493c None None] scheduler.max_attempts         = 3
{{(pid=19797) log_opt_values /usr/local/lib/python2.7/dist-
packages/oslo_config/cfg.py:2888}}

So retries are not disabled. The problem appears to be that we don't set
the RequestSpec.retry field before calling select_destinations.

The schedule_and_build_instance method in conductor calls the scheduler
here:

https://github.com/openstack/nova/blob/07c925a5321e379293bbf0e55bf3c40798eaf21b/nova/conductor/manager.py#L1016

And doesn't populate filter properties (request spec) until later:

https://github.com/openstack/nova/blob/07c925a5321e379293bbf0e55bf3c40798eaf21b/nova/conductor/manager.py#L1108

Compare that to the old build_instances method where retry is populated
before calling the scheduler:

https://github.com/openstack/nova/blob/07c925a5321e379293bbf0e55bf3c40798eaf21b/nova/conductor/manager.py#L543

https://github.com/openstack/nova/blob/07c925a5321e379293bbf0e55bf3c40798eaf21b/nova/conductor/manager.py#L548

This is mostly just a serviceability bug because we'll continue to
support reschedules in the chosen compute host fails, but it's
definitely confusing when looking at the scheduler logs on the initial
create.

** Affects: nova
     Importance: Medium
         Status: Triaged


** Tags: conductor scheduler serviceability

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1738876

Title:
  Scheduler logs "Re-scheduling is disabled" on initial server create
  scheduling even though max_attempts>0

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  Looking at these scheduler logs during the initial select_destinations
  flow for a server create:

  http://logs.openstack.org/89/527289/1/check/ironic-tempest-dsvm-ipa-
  wholedisk-agent_ipmitool-tinyipa-
  multinode/406d4ab/logs/screen-n-sch.txt.gz#_Dec_12_22_23_47_783670

  There are 12 candidate hosts and the RetryFilter logs that re-
  scheduling is disabled on all 12 of them:

  Dec 12 22:23:47.783670 ubuntu-xenial-ovh-bhs1-0001397778 
nova-scheduler[19797]: DEBUG nova.filters [None 
req-bd0816cf-243e-4ceb-90cb-053b15dbcbc7 tempest-TestServerBasicOps-1888713420 
tempest-TestServerBasicOps-1888713420] Starting with 12 host(s) {{(pid=19797) 
get_filtered_objects /opt/stack/new/nova/nova/filters.py:70}}
  Dec 12 22:23:47.783964 ubuntu-xenial-ovh-bhs1-0001397778 
nova-scheduler[19797]: DEBUG nova.scheduler.filters.retry_filter [None 
req-bd0816cf-243e-4ceb-90cb-053b15dbcbc7 tempest-TestServerBasicOps-1888713420 
tempest-TestServerBasicOps-1888713420] Re-scheduling is disabled {{(pid=19797) 
host_passes /opt/stack/new/nova/nova/scheduler/filters/retry_filter.py:38}}
  ...

  This is confusing because CONF.scheduler.max_attempts is the default
  value of 3:

  Dec 12 22:01:58.784033 ubuntu-xenial-ovh-bhs1-0001397778 nova-
  scheduler[19797]: DEBUG oslo_service.service [None req-f45f1271-8d1d-
  4bb7-8bea-4d3a39be493c None None] scheduler.max_attempts         = 3
  {{(pid=19797) log_opt_values /usr/local/lib/python2.7/dist-
  packages/oslo_config/cfg.py:2888}}

  So retries are not disabled. The problem appears to be that we don't
  set the RequestSpec.retry field before calling select_destinations.

  The schedule_and_build_instance method in conductor calls the
  scheduler here:

  
https://github.com/openstack/nova/blob/07c925a5321e379293bbf0e55bf3c40798eaf21b/nova/conductor/manager.py#L1016

  And doesn't populate filter properties (request spec) until later:

  
https://github.com/openstack/nova/blob/07c925a5321e379293bbf0e55bf3c40798eaf21b/nova/conductor/manager.py#L1108

  Compare that to the old build_instances method where retry is
  populated before calling the scheduler:

  
https://github.com/openstack/nova/blob/07c925a5321e379293bbf0e55bf3c40798eaf21b/nova/conductor/manager.py#L543

  
https://github.com/openstack/nova/blob/07c925a5321e379293bbf0e55bf3c40798eaf21b/nova/conductor/manager.py#L548

  This is mostly just a serviceability bug because we'll continue to
  support reschedules in the chosen compute host fails, but it's
  definitely confusing when looking at the scheduler logs on the initial
  create.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1738876/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to