Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-15 Thread Chris Friesen
On 08/04/2018 05:18 PM, Matt Riedemann wrote: On 8/3/2018 9:14 AM, Chris Friesen wrote: I'm of two minds here. On the one hand, you have the case where the end user has accidentally requested some combination of things that isn't normally available, and they need to be able to ask the provider

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-14 Thread Eric Fried
Folks- The patch mentioned below [1] has undergone several rounds of review and collaborative revision, and we'd really like to get your feedback on it. From the commit message: Here are some examples of the debug output: - A request for three resources with no aggregate or trait

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-09 Thread Jay Pipes
On Wed, Aug 1, 2018 at 11:15 AM, Ben Nemec wrote: > Hi, > > I'm having an issue with no valid host errors when starting instances and > I'm struggling to figure out why. I thought the problem was disk space, > but I changed the disk_allocation_ratio and I'm still getting no valid > host. The

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-06 Thread Jay Pipes
On 08/04/2018 07:35 PM, Michael Glasgow wrote: On 8/2/2018 7:27 PM, Jay Pipes wrote: It's not an exception. It's normal course of events. NoValidHosts means there were no compute nodes that met the requested resource amounts. To clarify, I didn't mean a python exception. Neither did I. I

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-04 Thread Michael Glasgow
On 8/2/2018 7:27 PM, Jay Pipes wrote: It's not an exception. It's normal course of events. NoValidHosts means there were no compute nodes that met the requested resource amounts. To clarify, I didn't mean a python exception. I concede that I should've chosen a better word for the type of

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-04 Thread Matt Riedemann
On 8/3/2018 9:14 AM, Chris Friesen wrote: I'm of two minds here. On the one hand, you have the case where the end user has accidentally requested some combination of things that isn't normally available, and they need to be able to ask the provider what they did wrong.  I agree that this

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-04 Thread Matt Riedemann
On 8/2/2018 10:07 AM, Ben Nemec wrote: Now it seems like I need to do: 1) Change disk_allocation_ratio in nova.conf 2) Restart nova-scheduler, nova-compute, and nova-placement (or some subset of those?) Restarting the placement service wouldn't have any effect here. Wouldn't I need to

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-04 Thread Matt Riedemann
On 8/2/2018 3:04 PM, Chris Friesen wrote: At a previous Summit[1] there were some operators that said they just always ran nova-scheduler with debug logging enabled in order to deal with this issue, but that it was a pain to isolate the useful logs from the not-useful ones. Using CONF.trace

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-03 Thread Eric Fried
> I'm of two minds here. > > On the one hand, you have the case where the end user has accidentally > requested some combination of things that isn't normally available, and > they need to be able to ask the provider what they did wrong.  I agree > that this case is not really an exception, those

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-03 Thread Chris Friesen
On 08/02/2018 06:27 PM, Jay Pipes wrote: On 08/02/2018 06:18 PM, Michael Glasgow wrote: More generally, any time a service fails to deliver a resource which it is primarily designed to deliver, it seems to me at this stage that should probably be taken a bit more seriously than just "check

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-02 Thread Jay Pipes
On 08/02/2018 06:18 PM, Michael Glasgow wrote: On 08/02/18 15:04, Chris Friesen wrote: On 08/02/2018 01:04 PM, melanie witt wrote: The problem is an infamous one, which is, your users are trying to boot instances and they get "No Valid Host" and an instance in ERROR state. They contact

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-02 Thread Michael Glasgow
On 08/02/18 15:04, Chris Friesen wrote: On 08/02/2018 01:04 PM, melanie witt wrote: The problem is an infamous one, which is, your users are trying to boot instances and they get "No Valid Host" and an instance in ERROR state. They contact support, and now support is trying to determine why

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-02 Thread Jeremy Stanley
On 2018-08-02 14:04:10 -0600 (-0600), Chris Friesen wrote: [...] > At a previous Summit[1] there were some operators that said they just always > ran nova-scheduler with debug logging enabled in order to deal with this > issue, but that it was a pain to isolate the useful logs from the not-useful

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-02 Thread Chris Friesen
On 08/02/2018 01:04 PM, melanie witt wrote: The problem is an infamous one, which is, your users are trying to boot instances and they get "No Valid Host" and an instance in ERROR state. They contact support, and now support is trying to determine why NoValidHost happened. In the past, they

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-02 Thread melanie witt
On Thu, 2 Aug 2018 13:20:43 -0500, Eric Fried wrote: And we could do the same kind of approach with the non-granular request groups by reducing the single large SQL statement that is used for all resources and all traits (and all agg associations) into separate SELECT statements. It could be

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-02 Thread Eric Fried
> And we could do the same kind of approach with the non-granular request > groups by reducing the single large SQL statement that is used for all > resources and all traits (and all agg associations) into separate SELECT > statements. > > It could be slightly less performance-optimized but more

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-02 Thread Jay Pipes
On 08/02/2018 01:40 PM, Eric Fried wrote: Jay et al- And what I'm referring to is doing a single query per "related resource/trait placement request group" -- which is pretty much what we're heading towards anyway. If we had a request for: GET /allocation_candidates?  resources0=VCPU:1&  

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-02 Thread Eric Fried
I should have made it clear that this is a tiny incremental improvement, to a code path that almost nobody is even going to see until Stein. In no way was it intended to close this topic. Thanks, efried On 08/02/2018 12:40 PM, Eric Fried wrote: > Jay et al- > >> And what I'm referring to is

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-02 Thread Eric Fried
Jay et al- > And what I'm referring to is doing a single query per "related > resource/trait placement request group" -- which is pretty much what > we're heading towards anyway. > > If we had a request for: > > GET /allocation_candidates? >  resources0=VCPU:1& >  

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-02 Thread Joshua Harlow
Storage space is a concern; really? If it really is, then keep X of them for some definition of X (days, number, hours, other)? Offload the snapshot asynchronously if snapshotting during requests is a problem. We have the power! :) Chris Friesen wrote: On 08/01/2018 11:34 PM, Joshua Harlow

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-02 Thread Ben Nemec
On 08/01/2018 06:05 PM, Matt Riedemann wrote: On 8/1/2018 3:55 PM, Ben Nemec wrote: I changed disk_allocation_ratio to 2.0 in the config file and it had no effect on the existing resource provider.  I assume that is because I had initially deployed with it unset, so I got 1.0, and when I

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-02 Thread Jay Pipes
On 08/02/2018 01:12 AM, Alex Xu wrote: 2018-08-02 4:09 GMT+08:00 Jay Pipes >: On 08/01/2018 02:02 PM, Chris Friesen wrote: On 08/01/2018 11:32 AM, melanie witt wrote: I think it's definitely a significant issue that

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-02 Thread Chris Friesen
On 08/02/2018 04:10 AM, Chris Dent wrote: When people ask for something like what Chris mentioned: hosts with enough CPU: hosts that also have enough disk: hosts that also have enough memory: hosts that also meet extra spec host aggregate keys: hosts that also meet

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-02 Thread Chris Friesen
On 08/01/2018 11:34 PM, Joshua Harlow wrote: And I would be able to say request the explanation for a given request id (historical even) so that analysis could be done post-change and pre-change (say I update the algorithm for selection) so that the effects of alternations to said decisions

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-02 Thread Chris Dent
Responses to some of Jay's comments below, but first, to keep this on track with the original goal of the thread ("How to debug no valid host failures with placement") before I drag it to the side, some questions. When people ask for something like what Chris mentioned: hosts with enough

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-01 Thread Joshua Harlow
If I could, I would have something *like* the EXPLAIN syntax for looking at a sql query, but instead of telling me the query plan for a sql query, it would tell me the decisions (placement plan?) that resulted in a given resource being placed at a certain location. And I would be able to say

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-01 Thread Alex Xu
2018-08-02 4:09 GMT+08:00 Jay Pipes : > On 08/01/2018 02:02 PM, Chris Friesen wrote: > >> On 08/01/2018 11:32 AM, melanie witt wrote: >> >> I think it's definitely a significant issue that troubleshooting "No >>> allocation >>> candidates returned" from placement is so difficult. However, it's

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-01 Thread Matt Riedemann
On 8/1/2018 3:55 PM, Ben Nemec wrote: I changed disk_allocation_ratio to 2.0 in the config file and it had no effect on the existing resource provider.  I assume that is because I had initially deployed with it unset, so I got 1.0, and when I later wanted to change it the provider already

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-01 Thread Ben Nemec
On 08/01/2018 02:22 PM, Matt Riedemann wrote: On 8/1/2018 12:06 PM, Ben Nemec wrote: To close the loop on the problem I was having, it looks like the allocation_ratio config opts are now just defaults, and if you want to change ratios after the initial deployment you need to do so with the

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-01 Thread Jay Pipes
On 08/01/2018 02:02 PM, Chris Friesen wrote: On 08/01/2018 11:32 AM, melanie witt wrote: I think it's definitely a significant issue that troubleshooting "No allocation candidates returned" from placement is so difficult. However, it's not straightforward to log detail in placement when the

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-01 Thread Matt Riedemann
On 8/1/2018 12:06 PM, Ben Nemec wrote: To close the loop on the problem I was having, it looks like the allocation_ratio config opts are now just defaults, and if you want to change ratios after the initial deployment you need to do so with the client. You mean how

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-01 Thread Matt Riedemann
On 8/1/2018 12:32 PM, melanie witt wrote: I think it's definitely a significant issue that troubleshooting "No allocation candidates returned" from placement is so difficult. However, it's not straightforward to log detail in placement when the request for allocation candidates is essentially

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-01 Thread Chris Friesen
On 08/01/2018 11:32 AM, melanie witt wrote: I think it's definitely a significant issue that troubleshooting "No allocation candidates returned" from placement is so difficult. However, it's not straightforward to log detail in placement when the request for allocation candidates is essentially

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-01 Thread melanie witt
On Wed, 1 Aug 2018 12:17:36 -0500, Ben Nemec wrote: On 08/01/2018 11:23 AM, Chris Friesen wrote: On 08/01/2018 09:58 AM, Andrey Volkov wrote: Hi, It seems you need first to check what placement knows about resources of your cloud. This can be done either with REST API [1] or with

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-01 Thread Chris Friesen
On 08/01/2018 11:17 AM, Ben Nemec wrote: On 08/01/2018 11:23 AM, Chris Friesen wrote: The fact that there is no real way to get the equivalent of the old detailed scheduler logs is a known shortcoming in placement, and will become more of a problem if/when we move more complicated things

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-01 Thread Ben Nemec
On 08/01/2018 11:23 AM, Chris Friesen wrote: On 08/01/2018 09:58 AM, Andrey Volkov wrote: Hi, It seems you need first to check what placement knows about resources of your cloud. This can be done either with REST API [1] or with osc-placement [2]. For osc-placement you could use: pip

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-01 Thread Ben Nemec
Aha, thanks! That explains why I couldn't find any client commands for placement before. To close the loop on the problem I was having, it looks like the allocation_ratio config opts are now just defaults, and if you want to change ratios after the initial deployment you need to do so with

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-01 Thread Chris Friesen
On 08/01/2018 09:58 AM, Andrey Volkov wrote: Hi, It seems you need first to check what placement knows about resources of your cloud. This can be done either with REST API [1] or with osc-placement [2]. For osc-placement you could use: pip install osc-placement openstack allocation candidate

Re: [openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-01 Thread Andrey Volkov
Hi, It seems you need first to check what placement knows about resources of your cloud. This can be done either with REST API [1] or with osc-placement [2]. For osc-placement you could use: pip install osc-placement openstack allocation candidate list --resource DISK_GB=20 --resource

[openstack-dev] [nova] How to debug no valid host failures with placement

2018-08-01 Thread Ben Nemec
Hi, I'm having an issue with no valid host errors when starting instances and I'm struggling to figure out why. I thought the problem was disk space, but I changed the disk_allocation_ratio and I'm still getting no valid host. The host does have plenty of disk space free, so that shouldn't