Re: [Openstack-operators] [openstack-operators][nova] Verbosity of nova scheduler

Matt Riedemann Wed, 10 Jan 2018 11:35:38 -0800

On 1/10/2018 1:15 AM, Alec Hothan (ahothan) wrote:

+1 on the “no valid host found”, this one should be at the very top ofthe to-be-fixed list.
Very difficult to troubleshoot filters in lab testing (let alone inproduction) as there can be many of them. This will get worst with moreNFV optimization filters with so many combinations it gets reallycomplex to debug when a VM cannot be launched with NFV optimizedflavors. With the scheduler filtering engine, there should be asystematic way to log the reason for not finding a valid host - at thevery least the error should display which filter failed as an ERROR (andnot as DEBUG).
We really need to avoid deploying with DEBUG log level but unfortunatelythis is the only way to troubleshoot. Too many debug-level logs are fordevelopment debug (meaning pretty much useless in any circumstances –developer forgot to remove before commit of the feature), many errorsthat should be logged as ERROR but have been logged as DEBUG only.

The scheduler logs do print which filter returned 0 hosts for a givenrequest at INFO level.


For example, I was debugging this NoValidHost failure in a CI run:

http://logs.openstack.org/20/531020/3/check/tempest-full/1287dde/job-output.txt.gz#_2018-01-05_17_39_54_336999

And tracing the request ID to the scheduler logs, and filtering on justINFO level logging to simulate what you'd have for the default inproduction, I found the filter that kicked it out here:


http://logs.openstack.org/20/531020/3/check/tempest-full/1287dde/controller/logs/screen-n-sch.txt.gz?level=INFO#_Jan_05_17_00_30_564582

And there is a summary like this:

Jan 05 17:00:30.565996 ubuntu-xenial-infracloud-chocolate-0001705073nova-scheduler[8932]: INFO nova.filters [Nonereq-737984ae-3ae8-4506-a5f9-6655a4ebf206tempest-ServersAdminTestJSON-787960229tempest-ServersAdminTestJSON-787960229] Filtering removed all hosts forthe request with instance ID '8ae8dc23-8f3b-4f0f-8775-2dcc2a5fc75b'.Filter results: ['RetryFilter: (start: 1, end: 1)','AvailabilityZoneFilter: (start: 1, end: 1)', 'ComputeFilter: (start: 1,end: 1)', 'ComputeCapabilitiesFilter: (start: 1, end: 1)','ImagePropertiesFilter: (start: 1, end: 1)','ServerGroupAntiAffinityFilter: (start: 1, end: 1)','ServerGroupAffinityFilter: (start: 1, end: 1)', 'SameHostFilter:(start: 1, end: 0)']

"end: 0" means that's the filter that rejected the request. Withoutdigging into the actual code, the descriptions for the filters is here:


https://docs.openstack.org/nova/latest/user/filter-scheduler.html

Now just why this request failed requires a bit of understanding of whymy environment looks like (this CI run is using the CachingScheduler),so there isn't a simple "ERROR: SameHostFilter rejected request becauseyou're using the CachingScheduler which is racy by design and youcreated the instances in separate requests". You'd have a ton of falsenegative ERRORs in the logs because of valid reasons for rejected arequest based on the current state of the system, which is going to makedebugging real issues that much harder.


--

Thanks,

Matt

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-operators][nova] Verbosity of nova scheduler

Reply via email to