Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

2014-08-28 Thread Paul Murray
Hi Nikola,

Firstly, thanks for waiting for me to respond and sorry I was absent for
the last couple of weeks.

The extensible resource tracker bp deals with two distinct information
flows:

1. information about resources that is passed from the compute node to the
scheduler,
2. information about resource requirements passed to the scheduler and the
compute node.

If I understand your email below correctly, you are saying that information
such as extra_specs, is not made available to the compute node or the
resource plugins. This is specifically about the second item (2.) above.

The patch that you propose to revert addresses the first item (1.), i.e. it
provides a means to select which resources are tracked and to pass that
information to the scheduler. It gives us two things: we can add resource
plugins and pass information to the scheduler without having to change the
resource tracker or scheduler. We can also pick and chose which resource
plugins to use, and so what information we want to write to the database
and pass to the scheduler.

The ability to omit resource information is as useful as the ability to add
it. So if new a resource plugin is added, operators that do not use that
information do not need to configure it. As an operator myself, I would be
happy to omit the proliferation of compute node details that are coming,
while benefiting from those that are of use to me.

The interface for the plugins does not need to be considered a fixed
external interface, it is not. It is ok to add necessary parameters if
there is no other sensible way to pass information you need.

So in short, the ability to add resource information without impacting
everybody is the value that the patch you want to revert brings. If in the
future another design is settled on for resource tracking and scheduling it
will still have to face the same requirement. The compute node will have a
set of resource information that could be tracked and used, but not
everyone will want the overhead of discovering and reporting all of it, so
they should not need to have all of it.

Paul




On 19 August 2014 10:11, Nikola Đipanov ndipa...@redhat.com wrote:

 Since after a week of discussing it I see no compelling argument against
 reverting it - here's the proposal:

https://review.openstack.org/115218

 Thanks,
 N.

 On 08/12/2014 12:21 PM, Nikola Đipanov wrote:
  Hey Nova-istas,
 
  While I was hacking on [1] I was considering how to approach the fact
  that we now need to track one more thing (NUMA node utilization) in our
  resources. I went with - I'll add it to compute nodes table thinking
  it's a fundamental enough property of a compute host that it deserves to
  be there, although I was considering  Extensible Resource Tracker at one
  point (ERT from now on - see [2]) but looking at the code - it did not
  seem to provide anything I desperately needed, so I went with keeping it
  simple.
 
  So fast-forward a few days, and I caught myself solving a problem that I
  kept thinking ERT should have solved - but apparently hasn't, and I
  think it is fundamentally a broken design without it - so I'd really
  like to see it re-visited.
 
  The problem can be described by the following lemma (if you take 'lemma'
  to mean 'a sentence I came up with just now' :)):
 
  
  Due to the way scheduling works in Nova (roughly: pick a host based on
  stale(ish) data, rely on claims to trigger a re-schedule), _same exact_
  information that scheduling service used when making a placement
  decision, needs to be available to the compute service when testing the
  placement.
  
 
  This is not the case right now, and the ERT does not propose any way to
  solve it - (see how I hacked around needing to be able to get
  extra_specs when making claims in [3], without hammering the DB). The
  result will be that any resource that we add and needs user supplied
  info for scheduling an instance against it, will need a buggy
  re-implementation of gathering all the bits from the request that
  scheduler sees, to be able to work properly.
 
  This is obviously a bigger concern when we want to allow users to pass
  data (through image or flavor) that can affect scheduling, but still a
  huge concern IMHO.
 
  As I see that there are already BPs proposing to use this IMHO broken
  ERT ([4] for example), which will surely add to the proliferation of
  code that hacks around these design shortcomings in what is already a
  messy, but also crucial (for perf as well as features) bit of Nova code.
 
  I propose to revert [2] ASAP since it is still fresh, and see how we can
  come up with a cleaner design.
 
  Would like to hear opinions on this, before I propose the patch tho!
 
  Thanks all,
 
  Nikola
 
  [1]
 https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement
  [2] https://review.openstack.org/#/c/109643/
  [3] https://review.openstack.org/#/c/111782/
  [4] https://review.openstack.org/#/c/89893
 
  

Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

2014-08-19 Thread Sylvain Bauza


Le 15/08/2014 15:35, Andrew Laski a écrit :


On 08/14/2014 03:21 AM, Nikola Đipanov wrote:

On 08/13/2014 06:05 PM, Sylvain Bauza wrote:

Le 13/08/2014 12:21, Sylvain Bauza a écrit :

Le 12/08/2014 22:06, Sylvain Bauza a écrit :

Le 12/08/2014 18:54, Nikola Đipanov a écrit :

On 08/12/2014 04:49 PM, Sylvain Bauza wrote:

(sorry for reposting, missed 2 links...)

Hi Nikola,

Le 12/08/2014 12:21, Nikola Đipanov a écrit :

Hey Nova-istas,

While I was hacking on [1] I was considering how to approach 
the fact

that we now need to track one more thing (NUMA node utilization)
in our
resources. I went with - I'll add it to compute nodes table
thinking
it's a fundamental enough property of a compute host that it
deserves to
be there, although I was considering  Extensible Resource Tracker
at one
point (ERT from now on - see [2]) but looking at the code - it did
not
seem to provide anything I desperately needed, so I went with
keeping it
simple.

So fast-forward a few days, and I caught myself solving a problem
that I
kept thinking ERT should have solved - but apparently hasn't, 
and I
think it is fundamentally a broken design without it - so I'd 
really

like to see it re-visited.

The problem can be described by the following lemma (if you take
'lemma'
to mean 'a sentence I came up with just now' :)):


Due to the way scheduling works in Nova (roughly: pick a host
based on
stale(ish) data, rely on claims to trigger a re-schedule), _same
exact_
information that scheduling service used when making a placement
decision, needs to be available to the compute service when
testing the
placement.


This is not the case right now, and the ERT does not propose any
way to
solve it - (see how I hacked around needing to be able to get
extra_specs when making claims in [3], without hammering the 
DB). The
result will be that any resource that we add and needs user 
supplied

info for scheduling an instance against it, will need a buggy
re-implementation of gathering all the bits from the request that
scheduler sees, to be able to work properly.
Well, ERT does provide a plugin mechanism for testing resources 
at the
claim level. This is the plugin responsibility to implement a 
test()

method [2.1] which will be called when test_claim() [2.2]

So, provided this method is implemented, a local host check can be
done
based on the host's view of resources.



Yes - the problem is there is no clear API to get all the needed
bits to
do so - especially the user supplied one from image and flavors.
On top of that, in current implementation we only pass a hand-wavy
'usage' blob in. This makes anyone wanting to use this in 
conjunction

with some of the user supplied bits roll their own
'extract_data_from_instance_metadata_flavor_image' or similar 
which is

horrible and also likely bad for performance.

I see your concern where there is no interface for user-facing
resources like flavor or image metadata.
I also think indeed that the big 'usage' blob is not a good choice
for long-term vision.

That said, I don't think as we say in French to throw the bath
water... ie. the problem is with the RT, not the ERT (apart the
mention of third-party API that you noted - I'll go to it later 
below)

This is obviously a bigger concern when we want to allow users to
pass
data (through image or flavor) that can affect scheduling, but
still a
huge concern IMHO.

And here is where I agree with you : at the moment, ResourceTracker
(and
consequently Extensible RT) only provides the view of the resources
the
host is knowing (see my point above) and possibly some other 
resources

are missing.
So, whatever your choice of going with or without ERT, your 
patch [3]
still deserves it if we want not to lookup DB each time a claim 
goes.



As I see that there are already BPs proposing to use this IMHO 
broken
ERT ([4] for example), which will surely add to the 
proliferation of
code that hacks around these design shortcomings in what is 
already a

messy, but also crucial (for perf as well as features) bit of Nova
code.
Two distinct implementations of that spec (ie. instances and 
flavors)
have been proposed [2.3] [2.4] so reviews are welcome. If you 
see the
test() method, it's no-op thing for both plugins. I'm open to 
comments

because I have the stated problem : how can we define a limit on
just a
counter of instances and flavors ?


Will look at these - but none of them seem to hit the issue I am
complaining about, and that is that it will need to consider other
request data for claims, not only data available for on instances.

Also - the fact that you don't implement test() in flavor ones 
tells me
that the implementation is indeed racy (but it is racy atm as 
well) and
two requests can indeed race for the same host, and since no 
claims are
done, both can succeed. This is I believe (at least in case of 
single

flavor hosts) unlikely to happen in practice, but you get the idea.

Agreed, these 2 patches probably require another iteration, in

Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

2014-08-19 Thread Nikola Đipanov
Since after a week of discussing it I see no compelling argument against
reverting it - here's the proposal:

   https://review.openstack.org/115218

Thanks,
N.

On 08/12/2014 12:21 PM, Nikola Đipanov wrote:
 Hey Nova-istas,
 
 While I was hacking on [1] I was considering how to approach the fact
 that we now need to track one more thing (NUMA node utilization) in our
 resources. I went with - I'll add it to compute nodes table thinking
 it's a fundamental enough property of a compute host that it deserves to
 be there, although I was considering  Extensible Resource Tracker at one
 point (ERT from now on - see [2]) but looking at the code - it did not
 seem to provide anything I desperately needed, so I went with keeping it
 simple.
 
 So fast-forward a few days, and I caught myself solving a problem that I
 kept thinking ERT should have solved - but apparently hasn't, and I
 think it is fundamentally a broken design without it - so I'd really
 like to see it re-visited.
 
 The problem can be described by the following lemma (if you take 'lemma'
 to mean 'a sentence I came up with just now' :)):
 
 
 Due to the way scheduling works in Nova (roughly: pick a host based on
 stale(ish) data, rely on claims to trigger a re-schedule), _same exact_
 information that scheduling service used when making a placement
 decision, needs to be available to the compute service when testing the
 placement.
 
 
 This is not the case right now, and the ERT does not propose any way to
 solve it - (see how I hacked around needing to be able to get
 extra_specs when making claims in [3], without hammering the DB). The
 result will be that any resource that we add and needs user supplied
 info for scheduling an instance against it, will need a buggy
 re-implementation of gathering all the bits from the request that
 scheduler sees, to be able to work properly.
 
 This is obviously a bigger concern when we want to allow users to pass
 data (through image or flavor) that can affect scheduling, but still a
 huge concern IMHO.
 
 As I see that there are already BPs proposing to use this IMHO broken
 ERT ([4] for example), which will surely add to the proliferation of
 code that hacks around these design shortcomings in what is already a
 messy, but also crucial (for perf as well as features) bit of Nova code.
 
 I propose to revert [2] ASAP since it is still fresh, and see how we can
 come up with a cleaner design.
 
 Would like to hear opinions on this, before I propose the patch tho!
 
 Thanks all,
 
 Nikola
 
 [1] https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement
 [2] https://review.openstack.org/#/c/109643/
 [3] https://review.openstack.org/#/c/111782/
 [4] https://review.openstack.org/#/c/89893
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

2014-08-15 Thread Nikola Đipanov
On 08/14/2014 10:25 PM, Sylvain Bauza wrote:
 Hi mikal,
 
 Le 14 août 2014 01:49, Michael Still mi...@stillhq.com
 mailto:mi...@stillhq.com a écrit :

 So, there's been a lot of email in the last few days and I feel I am
 not keeping up.

 Sylvain, can you summarise for me what the plan is here? Can we roll
 forward or do we need to revert?
 
 Well, as we agreed with Nikola, the problem is not with ERT but RT, as
 the request data needs to be passed when claiming a resource.
 
 I'm proposing to keep ERT and only consider plugins that are not needing
 request_spec when claiming, but here there is no agreement yet.
 

Yes - we could do this, I still see no benefit in this.

FWIW - Jay Pipes made a comment that highlights much of the same issues
I did in this thread even before I started it, on the patch itself
(scroll down).

https://review.openstack.org/#/c/109643/

It's easy to miss since it was added post merge.

 Unfortunately, I'm on PTO till Tuesday, and Paul Murray this week as
 well. So I propose to delay the discussion by these days as that's not
 impacted by FPF.
 
 In the meantime, I created a patch for discussing a workaround [1] for
 Juno until we correctly figure out how to fix that issue, as it deserves
 a spec.
 
 Time is running out for Juno.

 
 Indeed, I'm mostly concerned by the example exception spec that Nikola
 mentioned [2] (isolate-scheduler-db) as it still needs a second +2 while
 FPF is in 1 week...
 I'm planning to deliver an alternative implementation without ERT wrt
 this discussion.
 

Ripping it out will make it more difficult for the Gantt team to go
ahead with the current plan for the split - yes, but maybe that actually
means you might want to re-visit some of your decision (did not follow
all of it, so don't want to comment in depth at this point, but throwing
it out there)?

N.

 -Sylvain
 
 [1] https://review.openstack.org/#/c/113936/
 
 [2] https://review.openstack.org/#/c/89893
 
 Thanks,
 Michael


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

2014-08-15 Thread Sylvain Bauza
Le 15 août 2014 08:16, Nikola Đipanov ndipa...@redhat.com a écrit :

 On 08/14/2014 10:25 PM, Sylvain Bauza wrote:
  Hi mikal,
 
  Le 14 août 2014 01:49, Michael Still mi...@stillhq.com
  mailto:mi...@stillhq.com a écrit :
 
  So, there's been a lot of email in the last few days and I feel I am
  not keeping up.
 
  Sylvain, can you summarise for me what the plan is here? Can we roll
  forward or do we need to revert?
 
  Well, as we agreed with Nikola, the problem is not with ERT but RT, as
  the request data needs to be passed when claiming a resource.
 
  I'm proposing to keep ERT and only consider plugins that are not needing
  request_spec when claiming, but here there is no agreement yet.
 

 Yes - we could do this, I still see no benefit in this.

 FWIW - Jay Pipes made a comment that highlights much of the same issues
 I did in this thread even before I started it, on the patch itself
 (scroll down).

 https://review.openstack.org/#/c/109643/

 It's easy to miss since it was added post merge.


As said previously, I'm not saying that the current interface would not
have to change because of some issues. My personal concern is that I just
don't want to see technical debt blocking all temptatives to move forward
and treat this technical debt more easily because of a separate project
with new velocity.

As a summary, I don't trust in big-bangs in Nova and prefer to do small
iterations with the current state.

So that's why I'm pro a negociative approach. Take the filters and the job
we do by removing direct access to DB : as anyone can propose a patch
breaking that (and you know how it's easy to propose a filter and how many
people are doing that at the moment...), that's reviewers duty - and me in
particular - to say their voice in order to make sure it's not going to be
merged.

Here, same idea. That's not a REST API that anyone can consume, new plugins
still need to be merged if they want to go upstream.

  Unfortunately, I'm on PTO till Tuesday, and Paul Murray this week as
  well. So I propose to delay the discussion by these days as that's not
  impacted by FPF.
 
  In the meantime, I created a patch for discussing a workaround [1] for
  Juno until we correctly figure out how to fix that issue, as it deserves
  a spec.
 
  Time is running out for Juno.
 
 
  Indeed, I'm mostly concerned by the example exception spec that Nikola
  mentioned [2] (isolate-scheduler-db) as it still needs a second +2 while
  FPF is in 1 week...
  I'm planning to deliver an alternative implementation without ERT wrt
  this discussion.
 

 Ripping it out will make it more difficult for the Gantt team to go
 ahead with the current plan for the split - yes, but maybe that actually
 means you might want to re-visit some of your decision (did not follow
 all of it, so don't want to comment in depth at this point, but throwing
 it out there)?

 N.

Well, you hit a good point : what alternative if so ? This spec had many
proposals as many solutions can be found but we decided to go with ERT
because of its good integration with the existing RT.

I'm rally open to discussion in the spec itself, as I really like hearing
other voices.


  -Sylvain
 
  [1] https://review.openstack.org/#/c/113936/
 
  [2] https://review.openstack.org/#/c/89893
 
  Thanks,
  Michael


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

2014-08-15 Thread Andrew Laski


On 08/14/2014 03:21 AM, Nikola Đipanov wrote:

On 08/13/2014 06:05 PM, Sylvain Bauza wrote:

Le 13/08/2014 12:21, Sylvain Bauza a écrit :

Le 12/08/2014 22:06, Sylvain Bauza a écrit :

Le 12/08/2014 18:54, Nikola Đipanov a écrit :

On 08/12/2014 04:49 PM, Sylvain Bauza wrote:

(sorry for reposting, missed 2 links...)

Hi Nikola,

Le 12/08/2014 12:21, Nikola Đipanov a écrit :

Hey Nova-istas,

While I was hacking on [1] I was considering how to approach the fact
that we now need to track one more thing (NUMA node utilization)
in our
resources. I went with - I'll add it to compute nodes table
thinking
it's a fundamental enough property of a compute host that it
deserves to
be there, although I was considering  Extensible Resource Tracker
at one
point (ERT from now on - see [2]) but looking at the code - it did
not
seem to provide anything I desperately needed, so I went with
keeping it
simple.

So fast-forward a few days, and I caught myself solving a problem
that I
kept thinking ERT should have solved - but apparently hasn't, and I
think it is fundamentally a broken design without it - so I'd really
like to see it re-visited.

The problem can be described by the following lemma (if you take
'lemma'
to mean 'a sentence I came up with just now' :)):


Due to the way scheduling works in Nova (roughly: pick a host
based on
stale(ish) data, rely on claims to trigger a re-schedule), _same
exact_
information that scheduling service used when making a placement
decision, needs to be available to the compute service when
testing the
placement.


This is not the case right now, and the ERT does not propose any
way to
solve it - (see how I hacked around needing to be able to get
extra_specs when making claims in [3], without hammering the DB). The
result will be that any resource that we add and needs user supplied
info for scheduling an instance against it, will need a buggy
re-implementation of gathering all the bits from the request that
scheduler sees, to be able to work properly.

Well, ERT does provide a plugin mechanism for testing resources at the
claim level. This is the plugin responsibility to implement a test()
method [2.1] which will be called when test_claim() [2.2]

So, provided this method is implemented, a local host check can be
done
based on the host's view of resources.



Yes - the problem is there is no clear API to get all the needed
bits to
do so - especially the user supplied one from image and flavors.
On top of that, in current implementation we only pass a hand-wavy
'usage' blob in. This makes anyone wanting to use this in conjunction
with some of the user supplied bits roll their own
'extract_data_from_instance_metadata_flavor_image' or similar which is
horrible and also likely bad for performance.

I see your concern where there is no interface for user-facing
resources like flavor or image metadata.
I also think indeed that the big 'usage' blob is not a good choice
for long-term vision.

That said, I don't think as we say in French to throw the bath
water... ie. the problem is with the RT, not the ERT (apart the
mention of third-party API that you noted - I'll go to it later below)

This is obviously a bigger concern when we want to allow users to
pass
data (through image or flavor) that can affect scheduling, but
still a
huge concern IMHO.

And here is where I agree with you : at the moment, ResourceTracker
(and
consequently Extensible RT) only provides the view of the resources
the
host is knowing (see my point above) and possibly some other resources
are missing.
So, whatever your choice of going with or without ERT, your patch [3]
still deserves it if we want not to lookup DB each time a claim goes.



As I see that there are already BPs proposing to use this IMHO broken
ERT ([4] for example), which will surely add to the proliferation of
code that hacks around these design shortcomings in what is already a
messy, but also crucial (for perf as well as features) bit of Nova
code.

Two distinct implementations of that spec (ie. instances and flavors)
have been proposed [2.3] [2.4] so reviews are welcome. If you see the
test() method, it's no-op thing for both plugins. I'm open to comments
because I have the stated problem : how can we define a limit on
just a
counter of instances and flavors ?


Will look at these - but none of them seem to hit the issue I am
complaining about, and that is that it will need to consider other
request data for claims, not only data available for on instances.

Also - the fact that you don't implement test() in flavor ones tells me
that the implementation is indeed racy (but it is racy atm as well) and
two requests can indeed race for the same host, and since no claims are
done, both can succeed. This is I believe (at least in case of single
flavor hosts) unlikely to happen in practice, but you get the idea.

Agreed, these 2 patches probably require another iteration, in
particular how we make sure that it won't be racy. So I need another
run to think 

Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

2014-08-14 Thread Nikola Đipanov
On 08/13/2014 06:05 PM, Sylvain Bauza wrote:
 
 Le 13/08/2014 12:21, Sylvain Bauza a écrit :

 Le 12/08/2014 22:06, Sylvain Bauza a écrit :

 Le 12/08/2014 18:54, Nikola Đipanov a écrit :
 On 08/12/2014 04:49 PM, Sylvain Bauza wrote:
 (sorry for reposting, missed 2 links...)

 Hi Nikola,

 Le 12/08/2014 12:21, Nikola Đipanov a écrit :
 Hey Nova-istas,

 While I was hacking on [1] I was considering how to approach the fact
 that we now need to track one more thing (NUMA node utilization)
 in our
 resources. I went with - I'll add it to compute nodes table
 thinking
 it's a fundamental enough property of a compute host that it
 deserves to
 be there, although I was considering  Extensible Resource Tracker
 at one
 point (ERT from now on - see [2]) but looking at the code - it did
 not
 seem to provide anything I desperately needed, so I went with
 keeping it
 simple.

 So fast-forward a few days, and I caught myself solving a problem
 that I
 kept thinking ERT should have solved - but apparently hasn't, and I
 think it is fundamentally a broken design without it - so I'd really
 like to see it re-visited.

 The problem can be described by the following lemma (if you take
 'lemma'
 to mean 'a sentence I came up with just now' :)):

 
 Due to the way scheduling works in Nova (roughly: pick a host
 based on
 stale(ish) data, rely on claims to trigger a re-schedule), _same
 exact_
 information that scheduling service used when making a placement
 decision, needs to be available to the compute service when
 testing the
 placement.
 

 This is not the case right now, and the ERT does not propose any
 way to
 solve it - (see how I hacked around needing to be able to get
 extra_specs when making claims in [3], without hammering the DB). The
 result will be that any resource that we add and needs user supplied
 info for scheduling an instance against it, will need a buggy
 re-implementation of gathering all the bits from the request that
 scheduler sees, to be able to work properly.
 Well, ERT does provide a plugin mechanism for testing resources at the
 claim level. This is the plugin responsibility to implement a test()
 method [2.1] which will be called when test_claim() [2.2]

 So, provided this method is implemented, a local host check can be
 done
 based on the host's view of resources.


 Yes - the problem is there is no clear API to get all the needed
 bits to
 do so - especially the user supplied one from image and flavors.
 On top of that, in current implementation we only pass a hand-wavy
 'usage' blob in. This makes anyone wanting to use this in conjunction
 with some of the user supplied bits roll their own
 'extract_data_from_instance_metadata_flavor_image' or similar which is
 horrible and also likely bad for performance.

 I see your concern where there is no interface for user-facing
 resources like flavor or image metadata.
 I also think indeed that the big 'usage' blob is not a good choice
 for long-term vision.

 That said, I don't think as we say in French to throw the bath
 water... ie. the problem is with the RT, not the ERT (apart the
 mention of third-party API that you noted - I'll go to it later below)
 This is obviously a bigger concern when we want to allow users to
 pass
 data (through image or flavor) that can affect scheduling, but
 still a
 huge concern IMHO.
 And here is where I agree with you : at the moment, ResourceTracker
 (and
 consequently Extensible RT) only provides the view of the resources
 the
 host is knowing (see my point above) and possibly some other resources
 are missing.
 So, whatever your choice of going with or without ERT, your patch [3]
 still deserves it if we want not to lookup DB each time a claim goes.


 As I see that there are already BPs proposing to use this IMHO broken
 ERT ([4] for example), which will surely add to the proliferation of
 code that hacks around these design shortcomings in what is already a
 messy, but also crucial (for perf as well as features) bit of Nova
 code.
 Two distinct implementations of that spec (ie. instances and flavors)
 have been proposed [2.3] [2.4] so reviews are welcome. If you see the
 test() method, it's no-op thing for both plugins. I'm open to comments
 because I have the stated problem : how can we define a limit on
 just a
 counter of instances and flavors ?

 Will look at these - but none of them seem to hit the issue I am
 complaining about, and that is that it will need to consider other
 request data for claims, not only data available for on instances.

 Also - the fact that you don't implement test() in flavor ones tells me
 that the implementation is indeed racy (but it is racy atm as well) and
 two requests can indeed race for the same host, and since no claims are
 done, both can succeed. This is I believe (at least in case of single
 flavor hosts) unlikely to happen in practice, but you get the idea.

 Agreed, these 2 patches probably require another iteration, in
 particular how we make sure that it 

Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

2014-08-14 Thread Sylvain Bauza
Hi mikal,

Le 14 août 2014 01:49, Michael Still mi...@stillhq.com a écrit :

 So, there's been a lot of email in the last few days and I feel I am
 not keeping up.

 Sylvain, can you summarise for me what the plan is here? Can we roll
 forward or do we need to revert?

Well, as we agreed with Nikola, the problem is not with ERT but RT, as the
request data needs to be passed when claiming a resource.

I'm proposing to keep ERT and only consider plugins that are not needing
request_spec when claiming, but here there is no agreement yet.

Unfortunately, I'm on PTO till Tuesday, and Paul Murray this week as well.
So I propose to delay the discussion by these days as that's not impacted
by FPF.

In the meantime, I created a patch for discussing a workaround [1] for Juno
until we correctly figure out how to fix that issue, as it deserves a spec.

 Time is running out for Juno.


Indeed, I'm mostly concerned by the example exception spec that Nikola
mentioned [2] (isolate-scheduler-db) as it still needs a second +2 while
FPF is in 1 week...
I'm planning to deliver an alternative implementation without ERT wrt this
discussion.

-Sylvain

[1] https://review.openstack.org/#/c/113936/

[2] https://review.openstack.org/#/c/89893

 Thanks,
 Michael

 On Thu, Aug 14, 2014 at 3:40 AM, Sylvain Bauza sba...@redhat.com wrote:
 
  Le 13/08/2014 18:40, Brian Elliott a écrit :
 
  On Aug 12, 2014, at 5:21 AM, Nikola Đipanov ndipa...@redhat.com
wrote:
 
  Hey Nova-istas,
 
  While I was hacking on [1] I was considering how to approach the fact
  that we now need to track one more thing (NUMA node utilization) in
our
  resources. I went with - I'll add it to compute nodes table thinking
  it's a fundamental enough property of a compute host that it deserves
to
  be there, although I was considering  Extensible Resource Tracker at
one
  point (ERT from now on - see [2]) but looking at the code - it did not
  seem to provide anything I desperately needed, so I went with keeping
it
  simple.
 
  So fast-forward a few days, and I caught myself solving a problem
that I
  kept thinking ERT should have solved - but apparently hasn't, and I
  think it is fundamentally a broken design without it - so I'd really
  like to see it re-visited.
 
  The problem can be described by the following lemma (if you take
'lemma'
  to mean 'a sentence I came up with just now' :)):
 
  
  Due to the way scheduling works in Nova (roughly: pick a host based on
  stale(ish) data, rely on claims to trigger a re-schedule), _same
exact_
  information that scheduling service used when making a placement
  decision, needs to be available to the compute service when testing
the
  placement.
  “
 
  Correct
 
  This is not the case right now, and the ERT does not propose any way
to
  solve it - (see how I hacked around needing to be able to get
  extra_specs when making claims in [3], without hammering the DB). The
  result will be that any resource that we add and needs user supplied
  info for scheduling an instance against it, will need a buggy
  re-implementation of gathering all the bits from the request that
  scheduler sees, to be able to work properly.
 
  Agreed, ERT does not attempt to solve this problem of ensuring RT has
an
  identical set of information for testing claims.  I don’t think it was
  intended to.
 
  ERT does solve the issue of bloat in the RT with adding
  just-one-more-thing to test usage-wise.  It gives a nice hook for
inserting
  your claim logic for your specific use case.
 
 
  I think Nikola and I agreed on the fact that ERT is not responsible for
this
  design. That said I can talk on behalf of Nikola...
 
 
 
  This is obviously a bigger concern when we want to allow users to pass
  data (through image or flavor) that can affect scheduling, but still a
  huge concern IMHO.
 
  I think passing additional data through to compute just wasn’t a
problem
  that ERT aimed to solve.  (Paul Murray?)  That being said,
coordinating the
  passing of any extra data required to test a claim that is *not*
sourced
  from the host itself would be a very nice addition.  You are working
around
  it with some caching in your flavor db lookup use case, although one
could
  of course cook up a cleaner patch to pass such data through on the
“build
  this” request to the compute.
 
 
  Indeed, and that's why I think the problem can be resolved thanks to 2
  different things :
  1. Filters need to look at what ERT is giving them, that's what
  isolate-scheduler-db is trying to do (see my patches [2.3 and 2.4] on
the
  previous emails
  2. Some extra user request needs to be checked in the test() method of
ERT
  plugins (where claims are done), so I provided a WIP patch for
discussing it
  : https://review.openstack.org/#/c/113936/
 
 
 
  As I see that there are already BPs proposing to use this IMHO broken
  ERT ([4] for example), which will surely add to the proliferation of
  code that hacks around these design shortcomings in what is already a
  

Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

2014-08-13 Thread Sylvain Bauza


Le 12/08/2014 22:06, Sylvain Bauza a écrit :


Le 12/08/2014 18:54, Nikola Đipanov a écrit :

On 08/12/2014 04:49 PM, Sylvain Bauza wrote:

(sorry for reposting, missed 2 links...)

Hi Nikola,

Le 12/08/2014 12:21, Nikola Đipanov a écrit :

Hey Nova-istas,

While I was hacking on [1] I was considering how to approach the fact
that we now need to track one more thing (NUMA node utilization) in 
our

resources. I went with - I'll add it to compute nodes table thinking
it's a fundamental enough property of a compute host that it 
deserves to
be there, although I was considering  Extensible Resource Tracker 
at one

point (ERT from now on - see [2]) but looking at the code - it did not
seem to provide anything I desperately needed, so I went with 
keeping it

simple.

So fast-forward a few days, and I caught myself solving a problem 
that I

kept thinking ERT should have solved - but apparently hasn't, and I
think it is fundamentally a broken design without it - so I'd really
like to see it re-visited.

The problem can be described by the following lemma (if you take 
'lemma'

to mean 'a sentence I came up with just now' :)):


Due to the way scheduling works in Nova (roughly: pick a host based on
stale(ish) data, rely on claims to trigger a re-schedule), _same 
exact_

information that scheduling service used when making a placement
decision, needs to be available to the compute service when testing 
the

placement.


This is not the case right now, and the ERT does not propose any 
way to

solve it - (see how I hacked around needing to be able to get
extra_specs when making claims in [3], without hammering the DB). The
result will be that any resource that we add and needs user supplied
info for scheduling an instance against it, will need a buggy
re-implementation of gathering all the bits from the request that
scheduler sees, to be able to work properly.

Well, ERT does provide a plugin mechanism for testing resources at the
claim level. This is the plugin responsibility to implement a test()
method [2.1] which will be called when test_claim() [2.2]

So, provided this method is implemented, a local host check can be done
based on the host's view of resources.



Yes - the problem is there is no clear API to get all the needed bits to
do so - especially the user supplied one from image and flavors.
On top of that, in current implementation we only pass a hand-wavy
'usage' blob in. This makes anyone wanting to use this in conjunction
with some of the user supplied bits roll their own
'extract_data_from_instance_metadata_flavor_image' or similar which is
horrible and also likely bad for performance.


I see your concern where there is no interface for user-facing 
resources like flavor or image metadata.
I also think indeed that the big 'usage' blob is not a good choice for 
long-term vision.


That said, I don't think as we say in French to throw the bath 
water... ie. the problem is with the RT, not the ERT (apart the 
mention of third-party API that you noted - I'll go to it later below)

This is obviously a bigger concern when we want to allow users to pass
data (through image or flavor) that can affect scheduling, but still a
huge concern IMHO.
And here is where I agree with you : at the moment, ResourceTracker 
(and

consequently Extensible RT) only provides the view of the resources the
host is knowing (see my point above) and possibly some other resources
are missing.
So, whatever your choice of going with or without ERT, your patch [3]
still deserves it if we want not to lookup DB each time a claim goes.



As I see that there are already BPs proposing to use this IMHO broken
ERT ([4] for example), which will surely add to the proliferation of
code that hacks around these design shortcomings in what is already a
messy, but also crucial (for perf as well as features) bit of Nova 
code.

Two distinct implementations of that spec (ie. instances and flavors)
have been proposed [2.3] [2.4] so reviews are welcome. If you see the
test() method, it's no-op thing for both plugins. I'm open to comments
because I have the stated problem : how can we define a limit on just a
counter of instances and flavors ?


Will look at these - but none of them seem to hit the issue I am
complaining about, and that is that it will need to consider other
request data for claims, not only data available for on instances.

Also - the fact that you don't implement test() in flavor ones tells me
that the implementation is indeed racy (but it is racy atm as well) and
two requests can indeed race for the same host, and since no claims are
done, both can succeed. This is I believe (at least in case of single
flavor hosts) unlikely to happen in practice, but you get the idea.


Agreed, these 2 patches probably require another iteration, in 
particular how we make sure that it won't be racy. So I need another 
run to think about what to test() for these 2 examples.
Another patch has to be done for aggregates, but it's still WIP so 

Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

2014-08-13 Thread Sylvain Bauza


Le 13/08/2014 12:21, Sylvain Bauza a écrit :


Le 12/08/2014 22:06, Sylvain Bauza a écrit :


Le 12/08/2014 18:54, Nikola Đipanov a écrit :

On 08/12/2014 04:49 PM, Sylvain Bauza wrote:

(sorry for reposting, missed 2 links...)

Hi Nikola,

Le 12/08/2014 12:21, Nikola Đipanov a écrit :

Hey Nova-istas,

While I was hacking on [1] I was considering how to approach the fact
that we now need to track one more thing (NUMA node utilization) 
in our
resources. I went with - I'll add it to compute nodes table 
thinking
it's a fundamental enough property of a compute host that it 
deserves to
be there, although I was considering  Extensible Resource Tracker 
at one
point (ERT from now on - see [2]) but looking at the code - it did 
not
seem to provide anything I desperately needed, so I went with 
keeping it

simple.

So fast-forward a few days, and I caught myself solving a problem 
that I

kept thinking ERT should have solved - but apparently hasn't, and I
think it is fundamentally a broken design without it - so I'd really
like to see it re-visited.

The problem can be described by the following lemma (if you take 
'lemma'

to mean 'a sentence I came up with just now' :)):


Due to the way scheduling works in Nova (roughly: pick a host 
based on
stale(ish) data, rely on claims to trigger a re-schedule), _same 
exact_

information that scheduling service used when making a placement
decision, needs to be available to the compute service when 
testing the

placement.


This is not the case right now, and the ERT does not propose any 
way to

solve it - (see how I hacked around needing to be able to get
extra_specs when making claims in [3], without hammering the DB). The
result will be that any resource that we add and needs user supplied
info for scheduling an instance against it, will need a buggy
re-implementation of gathering all the bits from the request that
scheduler sees, to be able to work properly.

Well, ERT does provide a plugin mechanism for testing resources at the
claim level. This is the plugin responsibility to implement a test()
method [2.1] which will be called when test_claim() [2.2]

So, provided this method is implemented, a local host check can be 
done

based on the host's view of resources.


Yes - the problem is there is no clear API to get all the needed 
bits to

do so - especially the user supplied one from image and flavors.
On top of that, in current implementation we only pass a hand-wavy
'usage' blob in. This makes anyone wanting to use this in conjunction
with some of the user supplied bits roll their own
'extract_data_from_instance_metadata_flavor_image' or similar which is
horrible and also likely bad for performance.


I see your concern where there is no interface for user-facing 
resources like flavor or image metadata.
I also think indeed that the big 'usage' blob is not a good choice 
for long-term vision.


That said, I don't think as we say in French to throw the bath 
water... ie. the problem is with the RT, not the ERT (apart the 
mention of third-party API that you noted - I'll go to it later below)
This is obviously a bigger concern when we want to allow users to 
pass
data (through image or flavor) that can affect scheduling, but 
still a

huge concern IMHO.
And here is where I agree with you : at the moment, ResourceTracker 
(and
consequently Extensible RT) only provides the view of the resources 
the

host is knowing (see my point above) and possibly some other resources
are missing.
So, whatever your choice of going with or without ERT, your patch [3]
still deserves it if we want not to lookup DB each time a claim goes.



As I see that there are already BPs proposing to use this IMHO broken
ERT ([4] for example), which will surely add to the proliferation of
code that hacks around these design shortcomings in what is already a
messy, but also crucial (for perf as well as features) bit of Nova 
code.

Two distinct implementations of that spec (ie. instances and flavors)
have been proposed [2.3] [2.4] so reviews are welcome. If you see the
test() method, it's no-op thing for both plugins. I'm open to comments
because I have the stated problem : how can we define a limit on 
just a

counter of instances and flavors ?


Will look at these - but none of them seem to hit the issue I am
complaining about, and that is that it will need to consider other
request data for claims, not only data available for on instances.

Also - the fact that you don't implement test() in flavor ones tells me
that the implementation is indeed racy (but it is racy atm as well) and
two requests can indeed race for the same host, and since no claims are
done, both can succeed. This is I believe (at least in case of single
flavor hosts) unlikely to happen in practice, but you get the idea.


Agreed, these 2 patches probably require another iteration, in 
particular how we make sure that it won't be racy. So I need another 
run to think about what to test() for these 2 examples.
Another 

Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

2014-08-13 Thread Brian Elliott

On Aug 12, 2014, at 5:21 AM, Nikola Đipanov ndipa...@redhat.com wrote:

 Hey Nova-istas,
 
 While I was hacking on [1] I was considering how to approach the fact
 that we now need to track one more thing (NUMA node utilization) in our
 resources. I went with - I'll add it to compute nodes table thinking
 it's a fundamental enough property of a compute host that it deserves to
 be there, although I was considering  Extensible Resource Tracker at one
 point (ERT from now on - see [2]) but looking at the code - it did not
 seem to provide anything I desperately needed, so I went with keeping it
 simple.
 
 So fast-forward a few days, and I caught myself solving a problem that I
 kept thinking ERT should have solved - but apparently hasn't, and I
 think it is fundamentally a broken design without it - so I'd really
 like to see it re-visited.
 
 The problem can be described by the following lemma (if you take 'lemma'
 to mean 'a sentence I came up with just now' :)):
 
 
 Due to the way scheduling works in Nova (roughly: pick a host based on
 stale(ish) data, rely on claims to trigger a re-schedule), _same exact_
 information that scheduling service used when making a placement
 decision, needs to be available to the compute service when testing the
 placement.
 “

Correct

 
 This is not the case right now, and the ERT does not propose any way to
 solve it - (see how I hacked around needing to be able to get
 extra_specs when making claims in [3], without hammering the DB). The
 result will be that any resource that we add and needs user supplied
 info for scheduling an instance against it, will need a buggy
 re-implementation of gathering all the bits from the request that
 scheduler sees, to be able to work properly.
Agreed, ERT does not attempt to solve this problem of ensuring RT has an 
identical set of information for testing claims.  I don’t think it was intended 
to.

ERT does solve the issue of bloat in the RT with adding just-one-more-thing to 
test usage-wise.  It gives a nice hook for inserting your claim logic for your 
specific use case.

 
 This is obviously a bigger concern when we want to allow users to pass
 data (through image or flavor) that can affect scheduling, but still a
 huge concern IMHO.
I think passing additional data through to compute just wasn’t a problem that 
ERT aimed to solve.  (Paul Murray?)  That being said, coordinating the passing 
of any extra data required to test a claim that is *not* sourced from the host 
itself would be a very nice addition.  You are working around it with some 
caching in your flavor db lookup use case, although one could of course cook up 
a cleaner patch to pass such data through on the “build this” request to the 
compute.

 
 As I see that there are already BPs proposing to use this IMHO broken
 ERT ([4] for example), which will surely add to the proliferation of
 code that hacks around these design shortcomings in what is already a
 messy, but also crucial (for perf as well as features) bit of Nova code.
 
 I propose to revert [2] ASAP since it is still fresh, and see how we can
 come up with a cleaner design.
 
I think the ERT is forward-progress here, but am willing to review 
patches/specs on improvements/replacements.  

 Would like to hear opinions on this, before I propose the patch tho!
 
 Thanks all,
 
 Nikola
 
 [1] https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement
 [2] https://review.openstack.org/#/c/109643/
 [3] https://review.openstack.org/#/c/111782/
 [4] https://review.openstack.org/#/c/89893
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

2014-08-13 Thread Sylvain Bauza


Le 13/08/2014 18:40, Brian Elliott a écrit :

On Aug 12, 2014, at 5:21 AM, Nikola Đipanov ndipa...@redhat.com wrote:


Hey Nova-istas,

While I was hacking on [1] I was considering how to approach the fact
that we now need to track one more thing (NUMA node utilization) in our
resources. I went with - I'll add it to compute nodes table thinking
it's a fundamental enough property of a compute host that it deserves to
be there, although I was considering  Extensible Resource Tracker at one
point (ERT from now on - see [2]) but looking at the code - it did not
seem to provide anything I desperately needed, so I went with keeping it
simple.

So fast-forward a few days, and I caught myself solving a problem that I
kept thinking ERT should have solved - but apparently hasn't, and I
think it is fundamentally a broken design without it - so I'd really
like to see it re-visited.

The problem can be described by the following lemma (if you take 'lemma'
to mean 'a sentence I came up with just now' :)):


Due to the way scheduling works in Nova (roughly: pick a host based on
stale(ish) data, rely on claims to trigger a re-schedule), _same exact_
information that scheduling service used when making a placement
decision, needs to be available to the compute service when testing the
placement.
“

Correct


This is not the case right now, and the ERT does not propose any way to
solve it - (see how I hacked around needing to be able to get
extra_specs when making claims in [3], without hammering the DB). The
result will be that any resource that we add and needs user supplied
info for scheduling an instance against it, will need a buggy
re-implementation of gathering all the bits from the request that
scheduler sees, to be able to work properly.

Agreed, ERT does not attempt to solve this problem of ensuring RT has an 
identical set of information for testing claims.  I don’t think it was intended 
to.

ERT does solve the issue of bloat in the RT with adding just-one-more-thing to 
test usage-wise.  It gives a nice hook for inserting your claim logic for your 
specific use case.


I think Nikola and I agreed on the fact that ERT is not responsible for 
this design. That said I can talk on behalf of Nikola...




This is obviously a bigger concern when we want to allow users to pass
data (through image or flavor) that can affect scheduling, but still a
huge concern IMHO.

I think passing additional data through to compute just wasn’t a problem that 
ERT aimed to solve.  (Paul Murray?)  That being said, coordinating the passing 
of any extra data required to test a claim that is *not* sourced from the host 
itself would be a very nice addition.  You are working around it with some 
caching in your flavor db lookup use case, although one could of course cook up 
a cleaner patch to pass such data through on the “build this” request to the 
compute.


Indeed, and that's why I think the problem can be resolved thanks to 2 
different things :
1. Filters need to look at what ERT is giving them, that's what 
isolate-scheduler-db is trying to do (see my patches [2.3 and 2.4] on 
the previous emails
2. Some extra user request needs to be checked in the test() method of 
ERT plugins (where claims are done), so I provided a WIP patch for 
discussing it : https://review.openstack.org/#/c/113936/




As I see that there are already BPs proposing to use this IMHO broken
ERT ([4] for example), which will surely add to the proliferation of
code that hacks around these design shortcomings in what is already a
messy, but also crucial (for perf as well as features) bit of Nova code.

I propose to revert [2] ASAP since it is still fresh, and see how we can
come up with a cleaner design.


I think the ERT is forward-progress here, but am willing to review 
patches/specs on improvements/replacements.


Sure, your comments are welcome on https://review.openstack.org/#/c/113373/
You can find an example where TypeAffinity filter is modified to look at 
HostState and where ERT is being used for updating HostState and for 
claiming resource.






Would like to hear opinions on this, before I propose the patch tho!

Thanks all,

Nikola

[1] https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement
[2] https://review.openstack.org/#/c/109643/
[3] https://review.openstack.org/#/c/111782/
[4] https://review.openstack.org/#/c/89893

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

2014-08-13 Thread Michael Still
So, there's been a lot of email in the last few days and I feel I am
not keeping up.

Sylvain, can you summarise for me what the plan is here? Can we roll
forward or do we need to revert? Time is running out for Juno.

Thanks,
Michael

On Thu, Aug 14, 2014 at 3:40 AM, Sylvain Bauza sba...@redhat.com wrote:

 Le 13/08/2014 18:40, Brian Elliott a écrit :

 On Aug 12, 2014, at 5:21 AM, Nikola Đipanov ndipa...@redhat.com wrote:

 Hey Nova-istas,

 While I was hacking on [1] I was considering how to approach the fact
 that we now need to track one more thing (NUMA node utilization) in our
 resources. I went with - I'll add it to compute nodes table thinking
 it's a fundamental enough property of a compute host that it deserves to
 be there, although I was considering  Extensible Resource Tracker at one
 point (ERT from now on - see [2]) but looking at the code - it did not
 seem to provide anything I desperately needed, so I went with keeping it
 simple.

 So fast-forward a few days, and I caught myself solving a problem that I
 kept thinking ERT should have solved - but apparently hasn't, and I
 think it is fundamentally a broken design without it - so I'd really
 like to see it re-visited.

 The problem can be described by the following lemma (if you take 'lemma'
 to mean 'a sentence I came up with just now' :)):

 
 Due to the way scheduling works in Nova (roughly: pick a host based on
 stale(ish) data, rely on claims to trigger a re-schedule), _same exact_
 information that scheduling service used when making a placement
 decision, needs to be available to the compute service when testing the
 placement.
 “

 Correct

 This is not the case right now, and the ERT does not propose any way to
 solve it - (see how I hacked around needing to be able to get
 extra_specs when making claims in [3], without hammering the DB). The
 result will be that any resource that we add and needs user supplied
 info for scheduling an instance against it, will need a buggy
 re-implementation of gathering all the bits from the request that
 scheduler sees, to be able to work properly.

 Agreed, ERT does not attempt to solve this problem of ensuring RT has an
 identical set of information for testing claims.  I don’t think it was
 intended to.

 ERT does solve the issue of bloat in the RT with adding
 just-one-more-thing to test usage-wise.  It gives a nice hook for inserting
 your claim logic for your specific use case.


 I think Nikola and I agreed on the fact that ERT is not responsible for this
 design. That said I can talk on behalf of Nikola...



 This is obviously a bigger concern when we want to allow users to pass
 data (through image or flavor) that can affect scheduling, but still a
 huge concern IMHO.

 I think passing additional data through to compute just wasn’t a problem
 that ERT aimed to solve.  (Paul Murray?)  That being said, coordinating the
 passing of any extra data required to test a claim that is *not* sourced
 from the host itself would be a very nice addition.  You are working around
 it with some caching in your flavor db lookup use case, although one could
 of course cook up a cleaner patch to pass such data through on the “build
 this” request to the compute.


 Indeed, and that's why I think the problem can be resolved thanks to 2
 different things :
 1. Filters need to look at what ERT is giving them, that's what
 isolate-scheduler-db is trying to do (see my patches [2.3 and 2.4] on the
 previous emails
 2. Some extra user request needs to be checked in the test() method of ERT
 plugins (where claims are done), so I provided a WIP patch for discussing it
 : https://review.openstack.org/#/c/113936/



 As I see that there are already BPs proposing to use this IMHO broken
 ERT ([4] for example), which will surely add to the proliferation of
 code that hacks around these design shortcomings in what is already a
 messy, but also crucial (for perf as well as features) bit of Nova code.

 I propose to revert [2] ASAP since it is still fresh, and see how we can
 come up with a cleaner design.

 I think the ERT is forward-progress here, but am willing to review
 patches/specs on improvements/replacements.


 Sure, your comments are welcome on https://review.openstack.org/#/c/113373/
 You can find an example where TypeAffinity filter is modified to look at
 HostState and where ERT is being used for updating HostState and for
 claiming resource.




 Would like to hear opinions on this, before I propose the patch tho!

 Thanks all,

 Nikola

 [1]
 https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement
 [2] https://review.openstack.org/#/c/109643/
 [3] https://review.openstack.org/#/c/111782/
 [4] https://review.openstack.org/#/c/89893

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 ___
 OpenStack-dev mailing list
 

Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

2014-08-13 Thread Jiang, Yunhong


 -Original Message-
 From: Nikola Đipanov [mailto:ndipa...@redhat.com]
 Sent: Tuesday, August 12, 2014 3:22 AM
 To: OpenStack Development Mailing List
 Subject: [openstack-dev] [Nova] Concerns around the Extensible Resource
 Tracker design - revert maybe?
 
 Hey Nova-istas,
 
 While I was hacking on [1] I was considering how to approach the fact
 that we now need to track one more thing (NUMA node utilization) in our
 resources. I went with - I'll add it to compute nodes table thinking
 it's a fundamental enough property of a compute host that it deserves to
 be there, although I was considering  Extensible Resource Tracker at one
 point (ERT from now on - see [2]) but looking at the code - it did not
 seem to provide anything I desperately needed, so I went with keeping it
 simple.
 
 So fast-forward a few days, and I caught myself solving a problem that I
 kept thinking ERT should have solved - but apparently hasn't, and I
 think it is fundamentally a broken design without it - so I'd really
 like to see it re-visited.
 
 The problem can be described by the following lemma (if you take 'lemma'
 to mean 'a sentence I came up with just now' :)):
 
 
 Due to the way scheduling works in Nova (roughly: pick a host based on
 stale(ish) data, rely on claims to trigger a re-schedule), _same exact_
 information that scheduling service used when making a placement
 decision, needs to be available to the compute service when testing the
 placement.
 
 
 This is not the case right now, and the ERT does not propose any way to
 solve it - (see how I hacked around needing to be able to get
 extra_specs when making claims in [3], without hammering the DB). The
 result will be that any resource that we add and needs user supplied
 info for scheduling an instance against it, will need a buggy
 re-implementation of gathering all the bits from the request that
 scheduler sees, to be able to work properly.
 
 This is obviously a bigger concern when we want to allow users to pass
 data (through image or flavor) that can affect scheduling, but still a
 huge concern IMHO.

I'd think this is not ERT itself, but more a RT issue. And the issue happens to
PCI also, which has to save the PCI request in the system metadata (No ERT at 
that time).
 It will be great to have a more generic solution.

Thanks
-jyh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

2014-08-12 Thread Sylvain Bauza

Hi Nikola,

Le 12/08/2014 12:21, Nikola Đipanov a écrit :

Hey Nova-istas,

While I was hacking on [1] I was considering how to approach the fact
that we now need to track one more thing (NUMA node utilization) in our
resources. I went with - I'll add it to compute nodes table thinking
it's a fundamental enough property of a compute host that it deserves to
be there, although I was considering  Extensible Resource Tracker at one
point (ERT from now on - see [2]) but looking at the code - it did not
seem to provide anything I desperately needed, so I went with keeping it
simple.

So fast-forward a few days, and I caught myself solving a problem that I
kept thinking ERT should have solved - but apparently hasn't, and I
think it is fundamentally a broken design without it - so I'd really
like to see it re-visited.

The problem can be described by the following lemma (if you take 'lemma'
to mean 'a sentence I came up with just now' :)):


Due to the way scheduling works in Nova (roughly: pick a host based on
stale(ish) data, rely on claims to trigger a re-schedule), _same exact_
information that scheduling service used when making a placement
decision, needs to be available to the compute service when testing the
placement.


This is not the case right now, and the ERT does not propose any way to
solve it - (see how I hacked around needing to be able to get
extra_specs when making claims in [3], without hammering the DB). The
result will be that any resource that we add and needs user supplied
info for scheduling an instance against it, will need a buggy
re-implementation of gathering all the bits from the request that
scheduler sees, to be able to work properly.


Well, ERT does provide a plugin mechanism for testing resources at the 
claim level. This is the plugin responsibility to implement a test() 
method [2.1] which will be called when test_claim() [2.2]


So, provided this method is implemented, a local host check can be done 
based on the host's view of resources.




This is obviously a bigger concern when we want to allow users to pass
data (through image or flavor) that can affect scheduling, but still a
huge concern IMHO.


And here is where I agree with you : at the moment, ResourceTracker (and 
consequently Extensible RT) only provides the view of the resources the 
host is knowing (see my point above) and possibly some other resources 
are missing.
So, whatever your choice of going with or without ERT, your patch [3] 
still deserves it if we want not to lookup DB each time a claim goes.




As I see that there are already BPs proposing to use this IMHO broken
ERT ([4] for example), which will surely add to the proliferation of
code that hacks around these design shortcomings in what is already a
messy, but also crucial (for perf as well as features) bit of Nova code.


Two distinct implementations of that spec (ie. instances and flavors) 
have been proposed [2.3] [2.4] so reviews are welcome. If you see the 
test() method, it's no-op thing for both plugins. I'm open to comments 
because I have the stated problem : how can we define a limit on just a 
counter of instances and flavors ?





I propose to revert [2] ASAP since it is still fresh, and see how we can
come up with a cleaner design.

Would like to hear opinions on this, before I propose the patch tho!


IMHO, I think the problem is more likely that the regular RT misses some 
information for each host so it requires to handle it on a case-by-case 
basis, but I don't think ERT either increases complexity or creates 
another issue.



Thanks,
-Sylvain


Thanks all,

Nikola

[1] https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement
[2] https://review.openstack.org/#/c/109643/
[3] https://review.openstack.org/#/c/111782/
[4] https://review.openstack.org/#/c/89893

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[2.1] 
https://github.com/openstack/nova/blob/master/nova/compute/resources/__init__.py#L75
[2.2] 
https://github.com/openstack/nova/blob/master/nova/compute/claims.py#L134


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

2014-08-12 Thread Sylvain Bauza

(sorry for reposting, missed 2 links...)

Hi Nikola,

Le 12/08/2014 12:21, Nikola Đipanov a écrit :

Hey Nova-istas,

While I was hacking on [1] I was considering how to approach the fact
that we now need to track one more thing (NUMA node utilization) in our
resources. I went with - I'll add it to compute nodes table thinking
it's a fundamental enough property of a compute host that it deserves to
be there, although I was considering  Extensible Resource Tracker at one
point (ERT from now on - see [2]) but looking at the code - it did not
seem to provide anything I desperately needed, so I went with keeping it
simple.

So fast-forward a few days, and I caught myself solving a problem that I
kept thinking ERT should have solved - but apparently hasn't, and I
think it is fundamentally a broken design without it - so I'd really
like to see it re-visited.

The problem can be described by the following lemma (if you take 'lemma'
to mean 'a sentence I came up with just now' :)):


Due to the way scheduling works in Nova (roughly: pick a host based on
stale(ish) data, rely on claims to trigger a re-schedule), _same exact_
information that scheduling service used when making a placement
decision, needs to be available to the compute service when testing the
placement.


This is not the case right now, and the ERT does not propose any way to
solve it - (see how I hacked around needing to be able to get
extra_specs when making claims in [3], without hammering the DB). The
result will be that any resource that we add and needs user supplied
info for scheduling an instance against it, will need a buggy
re-implementation of gathering all the bits from the request that
scheduler sees, to be able to work properly.


Well, ERT does provide a plugin mechanism for testing resources at the 
claim level. This is the plugin responsibility to implement a test() 
method [2.1] which will be called when test_claim() [2.2]


So, provided this method is implemented, a local host check can be done 
based on the host's view of resources.




This is obviously a bigger concern when we want to allow users to pass
data (through image or flavor) that can affect scheduling, but still a
huge concern IMHO.


And here is where I agree with you : at the moment, ResourceTracker (and 
consequently Extensible RT) only provides the view of the resources the 
host is knowing (see my point above) and possibly some other resources 
are missing.
So, whatever your choice of going with or without ERT, your patch [3] 
still deserves it if we want not to lookup DB each time a claim goes.




As I see that there are already BPs proposing to use this IMHO broken
ERT ([4] for example), which will surely add to the proliferation of
code that hacks around these design shortcomings in what is already a
messy, but also crucial (for perf as well as features) bit of Nova code.


Two distinct implementations of that spec (ie. instances and flavors) 
have been proposed [2.3] [2.4] so reviews are welcome. If you see the 
test() method, it's no-op thing for both plugins. I'm open to comments 
because I have the stated problem : how can we define a limit on just a 
counter of instances and flavors ?





I propose to revert [2] ASAP since it is still fresh, and see how we can
come up with a cleaner design.

Would like to hear opinions on this, before I propose the patch tho!


IMHO, I think the problem is more likely that the regular RT misses some 
information for each host so it requires to handle it on a case-by-case 
basis, but I don't think ERT either increases complexity or creates 
another issue.



Thanks,
-Sylvain


Thanks all,

Nikola

[1] https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement
[2] https://review.openstack.org/#/c/109643/
[3] https://review.openstack.org/#/c/111782/
[4] https://review.openstack.org/#/c/89893

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[2.1] 
https://github.com/openstack/nova/blob/master/nova/compute/resources/__init__.py#L75
[2.2] 
https://github.com/openstack/nova/blob/master/nova/compute/claims.py#L134

[2.3] https://review.openstack.org/112578
[2.4] https://review.openstack.org/113373



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

2014-08-12 Thread Nikola Đipanov
On 08/12/2014 04:49 PM, Sylvain Bauza wrote:
 (sorry for reposting, missed 2 links...)
 
 Hi Nikola,
 
 Le 12/08/2014 12:21, Nikola Đipanov a écrit :
 Hey Nova-istas,

 While I was hacking on [1] I was considering how to approach the fact
 that we now need to track one more thing (NUMA node utilization) in our
 resources. I went with - I'll add it to compute nodes table thinking
 it's a fundamental enough property of a compute host that it deserves to
 be there, although I was considering  Extensible Resource Tracker at one
 point (ERT from now on - see [2]) but looking at the code - it did not
 seem to provide anything I desperately needed, so I went with keeping it
 simple.

 So fast-forward a few days, and I caught myself solving a problem that I
 kept thinking ERT should have solved - but apparently hasn't, and I
 think it is fundamentally a broken design without it - so I'd really
 like to see it re-visited.

 The problem can be described by the following lemma (if you take 'lemma'
 to mean 'a sentence I came up with just now' :)):

 
 Due to the way scheduling works in Nova (roughly: pick a host based on
 stale(ish) data, rely on claims to trigger a re-schedule), _same exact_
 information that scheduling service used when making a placement
 decision, needs to be available to the compute service when testing the
 placement.
 

 This is not the case right now, and the ERT does not propose any way to
 solve it - (see how I hacked around needing to be able to get
 extra_specs when making claims in [3], without hammering the DB). The
 result will be that any resource that we add and needs user supplied
 info for scheduling an instance against it, will need a buggy
 re-implementation of gathering all the bits from the request that
 scheduler sees, to be able to work properly.
 
 Well, ERT does provide a plugin mechanism for testing resources at the
 claim level. This is the plugin responsibility to implement a test()
 method [2.1] which will be called when test_claim() [2.2]
 
 So, provided this method is implemented, a local host check can be done
 based on the host's view of resources.
 
 

Yes - the problem is there is no clear API to get all the needed bits to
do so - especially the user supplied one from image and flavors.
On top of that, in current implementation we only pass a hand-wavy
'usage' blob in. This makes anyone wanting to use this in conjunction
with some of the user supplied bits roll their own
'extract_data_from_instance_metadata_flavor_image' or similar which is
horrible and also likely bad for performance.

 This is obviously a bigger concern when we want to allow users to pass
 data (through image or flavor) that can affect scheduling, but still a
 huge concern IMHO.
 
 And here is where I agree with you : at the moment, ResourceTracker (and
 consequently Extensible RT) only provides the view of the resources the
 host is knowing (see my point above) and possibly some other resources
 are missing.
 So, whatever your choice of going with or without ERT, your patch [3]
 still deserves it if we want not to lookup DB each time a claim goes.
 
 
 As I see that there are already BPs proposing to use this IMHO broken
 ERT ([4] for example), which will surely add to the proliferation of
 code that hacks around these design shortcomings in what is already a
 messy, but also crucial (for perf as well as features) bit of Nova code.
 
 Two distinct implementations of that spec (ie. instances and flavors)
 have been proposed [2.3] [2.4] so reviews are welcome. If you see the
 test() method, it's no-op thing for both plugins. I'm open to comments
 because I have the stated problem : how can we define a limit on just a
 counter of instances and flavors ?
 

Will look at these - but none of them seem to hit the issue I am
complaining about, and that is that it will need to consider other
request data for claims, not only data available for on instances.

Also - the fact that you don't implement test() in flavor ones tells me
that the implementation is indeed racy (but it is racy atm as well) and
two requests can indeed race for the same host, and since no claims are
done, both can succeed. This is I believe (at least in case of single
flavor hosts) unlikely to happen in practice, but you get the idea.

 
 
 I propose to revert [2] ASAP since it is still fresh, and see how we can
 come up with a cleaner design.

 Would like to hear opinions on this, before I propose the patch tho!
 
 IMHO, I think the problem is more likely that the regular RT misses some
 information for each host so it requires to handle it on a case-by-case
 basis, but I don't think ERT either increases complexity or creates
 another issue.
 

RT does not miss info about the host, but about the particular request
which we have to fish out of different places like image_metadata
extra_specs etc, yet - it can't really work without them. This is
definitely a RT issue that is not specific to ERT.

However, I still see several issues 

Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

2014-08-12 Thread Sylvain Bauza


Le 12/08/2014 18:54, Nikola Đipanov a écrit :

On 08/12/2014 04:49 PM, Sylvain Bauza wrote:

(sorry for reposting, missed 2 links...)

Hi Nikola,

Le 12/08/2014 12:21, Nikola Đipanov a écrit :

Hey Nova-istas,

While I was hacking on [1] I was considering how to approach the fact
that we now need to track one more thing (NUMA node utilization) in our
resources. I went with - I'll add it to compute nodes table thinking
it's a fundamental enough property of a compute host that it deserves to
be there, although I was considering  Extensible Resource Tracker at one
point (ERT from now on - see [2]) but looking at the code - it did not
seem to provide anything I desperately needed, so I went with keeping it
simple.

So fast-forward a few days, and I caught myself solving a problem that I
kept thinking ERT should have solved - but apparently hasn't, and I
think it is fundamentally a broken design without it - so I'd really
like to see it re-visited.

The problem can be described by the following lemma (if you take 'lemma'
to mean 'a sentence I came up with just now' :)):


Due to the way scheduling works in Nova (roughly: pick a host based on
stale(ish) data, rely on claims to trigger a re-schedule), _same exact_
information that scheduling service used when making a placement
decision, needs to be available to the compute service when testing the
placement.


This is not the case right now, and the ERT does not propose any way to
solve it - (see how I hacked around needing to be able to get
extra_specs when making claims in [3], without hammering the DB). The
result will be that any resource that we add and needs user supplied
info for scheduling an instance against it, will need a buggy
re-implementation of gathering all the bits from the request that
scheduler sees, to be able to work properly.

Well, ERT does provide a plugin mechanism for testing resources at the
claim level. This is the plugin responsibility to implement a test()
method [2.1] which will be called when test_claim() [2.2]

So, provided this method is implemented, a local host check can be done
based on the host's view of resources.



Yes - the problem is there is no clear API to get all the needed bits to
do so - especially the user supplied one from image and flavors.
On top of that, in current implementation we only pass a hand-wavy
'usage' blob in. This makes anyone wanting to use this in conjunction
with some of the user supplied bits roll their own
'extract_data_from_instance_metadata_flavor_image' or similar which is
horrible and also likely bad for performance.


I see your concern where there is no interface for user-facing resources 
like flavor or image metadata.
I also think indeed that the big 'usage' blob is not a good choice for 
long-term vision.


That said, I don't think as we say in French to throw the bath water... 
ie. the problem is with the RT, not the ERT (apart the mention of 
third-party API that you noted - I'll go to it later below)

This is obviously a bigger concern when we want to allow users to pass
data (through image or flavor) that can affect scheduling, but still a
huge concern IMHO.

And here is where I agree with you : at the moment, ResourceTracker (and
consequently Extensible RT) only provides the view of the resources the
host is knowing (see my point above) and possibly some other resources
are missing.
So, whatever your choice of going with or without ERT, your patch [3]
still deserves it if we want not to lookup DB each time a claim goes.



As I see that there are already BPs proposing to use this IMHO broken
ERT ([4] for example), which will surely add to the proliferation of
code that hacks around these design shortcomings in what is already a
messy, but also crucial (for perf as well as features) bit of Nova code.

Two distinct implementations of that spec (ie. instances and flavors)
have been proposed [2.3] [2.4] so reviews are welcome. If you see the
test() method, it's no-op thing for both plugins. I'm open to comments
because I have the stated problem : how can we define a limit on just a
counter of instances and flavors ?


Will look at these - but none of them seem to hit the issue I am
complaining about, and that is that it will need to consider other
request data for claims, not only data available for on instances.

Also - the fact that you don't implement test() in flavor ones tells me
that the implementation is indeed racy (but it is racy atm as well) and
two requests can indeed race for the same host, and since no claims are
done, both can succeed. This is I believe (at least in case of single
flavor hosts) unlikely to happen in practice, but you get the idea.


Agreed, these 2 patches probably require another iteration, in 
particular how we make sure that it won't be racy. So I need another run 
to think about what to test() for these 2 examples.
Another patch has to be done for aggregates, but it's still WIP so not 
mentioned here.


Anyway, as discussed during today's