Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-23 Thread Jay Pipes

On 06/22/2016 01:56 PM, Paul Michali wrote:

I did have a question about the current implementation as described by
292499, 324379, and 292500.

Looking at the code, when a NUMAPagesTopology object is create, a new
parameter is passed for the "reserved" pages. This reservation comes
from a dictionary, which is populated at LIbvirtDriver init time, via
grabbing the multi-string configuration settings from nova.conf. Because
the object's API is changed, a version change is required.

Is it possible to, instead of adding a new argument to reduce the
"total" argument (Ian Wells suggested this to me on a patch I had), by
the number of reserved pages from the config file? This would prevent
the need to alter the object's API.  So, instead of:

 mempages = [
 objects.NUMAPagesTopology(
 size_kb=pages.size,
 total=pages.total,
 used=0,
 reserved=_get_reserved_memory_for_cell(
 self,cell.id , pages.size))
 for pages in cell.mempages]


Do something like this...

  mempages = [

objects.NUMAPagesTopology( size_kb=pages.size, used=0, total=pages.total
- _get_reserved_memory_for_cell( self, cell.id ,
pages.size)) for pages in cell.mempages]
If we do this, would it avoid issues with back porting the change?


No, that would cause anyone who upgraded to Mitaka to immediately have 
the total column contain incorrect data... you would essentially have a 
double-reserve calculation.


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-23 Thread Jay Pipes

On 06/22/2016 01:56 PM, Paul Michali wrote:

I did have a question about the current implementation as described by
292499, 324379, and 292500.

Looking at the code, when a NUMAPagesTopology object is create, a new
parameter is passed for the "reserved" pages. This reservation comes
from a dictionary, which is populated at LIbvirtDriver init time, via
grabbing the multi-string configuration settings from nova.conf. Because
the object's API is changed, a version change is required.

Is it possible to, instead of adding a new argument to reduce the
"total" argument (Ian Wells suggested this to me on a patch I had), by
the number of reserved pages from the config file? This would prevent
the need to alter the object's API.  So, instead of:

 mempages = [
 objects.NUMAPagesTopology(
 size_kb=pages.size,
 total=pages.total,
 used=0,
 reserved=_get_reserved_memory_for_cell(
 self,cell.id , pages.size))
 for pages in cell.mempages]


Do something like this...

  mempages = [

objects.NUMAPagesTopology( size_kb=pages.size, used=0, total=pages.total
- _get_reserved_memory_for_cell( self, cell.id ,
pages.size)) for pages in cell.mempages]
If we do this, would it avoid issues with back porting the change?

Thanks!

PCM


On Wed, Jun 15, 2016 at 5:52 PM Matt Riedemann
> wrote:

On 6/15/2016 3:10 PM, Paul Michali wrote:
 > Is the plan to back port that change to Mitaka?
 >
 > Thanks,
 >
 > PCM
 >
 >
 > On Wed, Jun 15, 2016 at 1:31 PM Matt Riedemann
 > 
>> wrote:
 >
 > On 6/14/2016 3:09 PM, Jay Pipes wrote:
 > >
 > > Yes. Code merged recently from Sahid does this:
 > >
 > > https://review.openstack.org/#/c/277422/
 > >
 > > Best,
 > > -jay
 > >
 >
 > That was actually reverted out of mitaka:
 >
 > https://review.openstack.org/#/c/292290/
 >
 > The feature change that got into newton was this:
 >
 > https://review.openstack.org/#/c/292499/
 >
 > Which was busted, and required:
 >
 > https://review.openstack.org/#/c/324379/
 >
 > Well, required as long as you want your compute service to
start. :)
 >
 > And no, we aren't backporting these, especially to liberty
which is
 > security / critical fix mode only now.
 >
 > --
 >
 > Thanks,
 >
 > Matt Riedemann
 >
 >
 >
  __
 > OpenStack Development Mailing List (not for usage questions)
 > Unsubscribe:
 > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

 >
  
 > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 >
 >
 >
 >
__
 > OpenStack Development Mailing List (not for usage questions)
 > Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

 > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 >

No, it's really a feature.

--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-22 Thread Paul Michali
I did have a question about the current implementation as described by
292499, 324379, and 292500.

Looking at the code, when a NUMAPagesTopology object is create, a new
parameter is passed for the "reserved" pages. This reservation comes from a
dictionary, which is populated at LIbvirtDriver init time, via grabbing the
multi-string configuration settings from nova.conf. Because the object's
API is changed, a version change is required.

Is it possible to, instead of adding a new argument to reduce the "total"
argument (Ian Wells suggested this to me on a patch I had), by the number
of reserved pages from the config file? This would prevent the need to
alter the object's API.  So, instead of:

mempages = [
objects.NUMAPagesTopology(
size_kb=pages.size,
total=pages.total,
used=0,
reserved=_get_reserved_memory_for_cell(
self, cell.id, pages.size))
for pages in cell.mempages]


Do something like this...

 mempages = [

objects.NUMAPagesTopology( size_kb=pages.size, used=0, total=pages.total -
_get_reserved_memory_for_cell( self, cell.id, pages.size)) for pages in
cell.mempages]
If we do this, would it avoid issues with back porting the change?

Thanks!

PCM


On Wed, Jun 15, 2016 at 5:52 PM Matt Riedemann 
wrote:

> On 6/15/2016 3:10 PM, Paul Michali wrote:
> > Is the plan to back port that change to Mitaka?
> >
> > Thanks,
> >
> > PCM
> >
> >
> > On Wed, Jun 15, 2016 at 1:31 PM Matt Riedemann
> > > wrote:
> >
> > On 6/14/2016 3:09 PM, Jay Pipes wrote:
> > >
> > > Yes. Code merged recently from Sahid does this:
> > >
> > > https://review.openstack.org/#/c/277422/
> > >
> > > Best,
> > > -jay
> > >
> >
> > That was actually reverted out of mitaka:
> >
> > https://review.openstack.org/#/c/292290/
> >
> > The feature change that got into newton was this:
> >
> > https://review.openstack.org/#/c/292499/
> >
> > Which was busted, and required:
> >
> > https://review.openstack.org/#/c/324379/
> >
> > Well, required as long as you want your compute service to start. :)
> >
> > And no, we aren't backporting these, especially to liberty which is
> > security / critical fix mode only now.
> >
> > --
> >
> > Thanks,
> >
> > Matt Riedemann
> >
> >
> >
>  __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > <
> http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> >
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> No, it's really a feature.
>
> --
>
> Thanks,
>
> Matt Riedemann
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-15 Thread Matt Riedemann

On 6/15/2016 3:10 PM, Paul Michali wrote:

Is the plan to back port that change to Mitaka?

Thanks,

PCM


On Wed, Jun 15, 2016 at 1:31 PM Matt Riedemann
> wrote:

On 6/14/2016 3:09 PM, Jay Pipes wrote:
>
> Yes. Code merged recently from Sahid does this:
>
> https://review.openstack.org/#/c/277422/
>
> Best,
> -jay
>

That was actually reverted out of mitaka:

https://review.openstack.org/#/c/292290/

The feature change that got into newton was this:

https://review.openstack.org/#/c/292499/

Which was busted, and required:

https://review.openstack.org/#/c/324379/

Well, required as long as you want your compute service to start. :)

And no, we aren't backporting these, especially to liberty which is
security / critical fix mode only now.

--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



No, it's really a feature.

--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-15 Thread Paul Michali
Is the plan to back port that change to Mitaka?

Thanks,

PCM


On Wed, Jun 15, 2016 at 1:31 PM Matt Riedemann 
wrote:

> On 6/14/2016 3:09 PM, Jay Pipes wrote:
> >
> > Yes. Code merged recently from Sahid does this:
> >
> > https://review.openstack.org/#/c/277422/
> >
> > Best,
> > -jay
> >
>
> That was actually reverted out of mitaka:
>
> https://review.openstack.org/#/c/292290/
>
> The feature change that got into newton was this:
>
> https://review.openstack.org/#/c/292499/
>
> Which was busted, and required:
>
> https://review.openstack.org/#/c/324379/
>
> Well, required as long as you want your compute service to start. :)
>
> And no, we aren't backporting these, especially to liberty which is
> security / critical fix mode only now.
>
> --
>
> Thanks,
>
> Matt Riedemann
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-15 Thread Matt Riedemann

On 6/14/2016 3:09 PM, Jay Pipes wrote:


Yes. Code merged recently from Sahid does this:

https://review.openstack.org/#/c/277422/

Best,
-jay



That was actually reverted out of mitaka:

https://review.openstack.org/#/c/292290/

The feature change that got into newton was this:

https://review.openstack.org/#/c/292499/

Which was busted, and required:

https://review.openstack.org/#/c/324379/

Well, required as long as you want your compute service to start. :)

And no, we aren't backporting these, especially to liberty which is 
security / critical fix mode only now.


--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-15 Thread Jay Pipes
There have been a number of bug fixes to the NUMA code in both Mitaka 
and Newton. I think you would need to be very careful in your backporting :)


-jay

On 06/15/2016 09:55 AM, Paul Michali wrote:

Yeah, was thinking more of technically vs policy. Wondering if there are
other dependencies or if I could patch this into a Liberty code base.


On Wed, Jun 15, 2016 at 12:38 PM Jay Pipes > wrote:

On 06/15/2016 03:58 AM, Paul Michali wrote:
 > Awesome Jay!
 >
 > Do you think this is something that can be backporting to Liberty w/o
 > other dependencies? We're running Liberty on our system right now.

Doubtful, Paul :( The policy for upstream is not to backport feature
patches. This would be something you would need to do yourself -- i.e.
keep patch technical debt for whichever distribution of OpenStack you
are using (or sources if you're going that route).

Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-15 Thread Paul Michali
Yeah, was thinking more of technically vs policy. Wondering if there are
other dependencies or if I could patch this into a Liberty code base.


On Wed, Jun 15, 2016 at 12:38 PM Jay Pipes  wrote:

> On 06/15/2016 03:58 AM, Paul Michali wrote:
> > Awesome Jay!
> >
> > Do you think this is something that can be backporting to Liberty w/o
> > other dependencies? We're running Liberty on our system right now.
>
> Doubtful, Paul :( The policy for upstream is not to backport feature
> patches. This would be something you would need to do yourself -- i.e.
> keep patch technical debt for whichever distribution of OpenStack you
> are using (or sources if you're going that route).
>
> Best,
> -jay
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-15 Thread Jay Pipes

On 06/15/2016 03:58 AM, Paul Michali wrote:

Awesome Jay!

Do you think this is something that can be backporting to Liberty w/o
other dependencies? We're running Liberty on our system right now.


Doubtful, Paul :( The policy for upstream is not to backport feature 
patches. This would be something you would need to do yourself -- i.e. 
keep patch technical debt for whichever distribution of OpenStack you 
are using (or sources if you're going that route).


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-15 Thread Paul Michali
Awesome Jay!

Do you think this is something that can be backporting to Liberty w/o other
dependencies? We're running Liberty on our system right now.


On Tue, Jun 14, 2016 at 4:10 PM Jay Pipes  wrote:

> On 06/14/2016 12:30 PM, Paul Michali wrote:
> > Well, looks like we figured out what is going on - maybe folks have some
> > ideas on how we could handle this issue.
> >
> > What I see is that for each VM create (small flavor), 1024 huge pages
> > are used and NUMA node 0 used. It appears that, when there is no longer
> > enough huge pages on that NUMA node, Nova with then schedule to the
> > other NUMA node and use those huge pages.
> >
> > In our case, we happen to have a special container running on the
> > compute nodes, that uses 512 huge pages. As a result, when there are 768
> > huge pages left, Nova thinks there are 1280 pages left and thinks one
> > more VM can be create. It tries, but the create fails.
> >
> > Some questions...
> >
> > 1) Is there some way to "reserve" huge pages in Nova?
>
> Yes. Code merged recently from Sahid does this:
>
> https://review.openstack.org/#/c/277422/
>
> Best,
> -jay
>
> > 2) If the create fails, should Nova try the other NUMA node (or is this
> > because it doesn't know why it failed)?
> > 3) Any ideas on how we can deal with this - without changing Nova?
> >
> > Thanks!
> >
> > PCM
> >
> >
> >
> > On Tue, Jun 14, 2016 at 1:09 PM Paul Michali  > > wrote:
> >
> > Great info Chris and thanks for confirming the assignment of blocks
> > of pages to a numa node.
> >
> > I'm still struggling with why each VM is being assigned to NUMA node
> > 0. Any ideas on where I should look to see why Nova is not using
> > NUMA id 1?
> >
> > Thanks!
> >
> >
> > PCM
> >
> >
> > On Tue, Jun 14, 2016 at 10:29 AM Chris Friesen
> > >
> > wrote:
> >
> > On 06/13/2016 02:17 PM, Paul Michali wrote:
> >  > Hmm... I tried Friday and again today, and I'm not seeing the
> > VMs being evenly
> >  > created on the NUMA nodes. Every Cirros VM is created on
> > nodeid 0.
> >  >
> >  > I have the m1/small flavor (@GB) selected and am using
> > hw:numa_nodes=1 and
> >  > hw:mem_page_size=2048 flavor-key settings. Each VM is
> > consuming 1024 huge pages
> >  > (of size 2MB), but is on nodeid 0 always. Also, it seems that
> > when I reach 1/2
> >  > of the total number of huge pages used, libvirt gives an
> > error saying there is
> >  > not enough memory to create the VM. Is it expected that the
> > huge pages are
> >  > "allocated" to each NUMA node?
> >
> > Yes, any given memory page exists on one NUMA node, and a
> > single-NUMA-node VM
> > will be constrained to a single host NUMA node and will use
> > memory from that
> > host NUMA node.
> >
> > You can see and/or adjust how many hugepages are available on
> > each NUMA node via
> > /sys/devices/system/node/nodeX/hugepages/hugepages-2048kB/*
> > where X is the host
> > NUMA node number.
> >
> > Chris
> >
> >
> >
>  __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > <
> http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
> >
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> >
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-14 Thread Jay Pipes

On 06/14/2016 12:30 PM, Paul Michali wrote:

Well, looks like we figured out what is going on - maybe folks have some
ideas on how we could handle this issue.

What I see is that for each VM create (small flavor), 1024 huge pages
are used and NUMA node 0 used. It appears that, when there is no longer
enough huge pages on that NUMA node, Nova with then schedule to the
other NUMA node and use those huge pages.

In our case, we happen to have a special container running on the
compute nodes, that uses 512 huge pages. As a result, when there are 768
huge pages left, Nova thinks there are 1280 pages left and thinks one
more VM can be create. It tries, but the create fails.

Some questions...

1) Is there some way to "reserve" huge pages in Nova?


Yes. Code merged recently from Sahid does this:

https://review.openstack.org/#/c/277422/

Best,
-jay


2) If the create fails, should Nova try the other NUMA node (or is this
because it doesn't know why it failed)?
3) Any ideas on how we can deal with this - without changing Nova?

Thanks!

PCM



On Tue, Jun 14, 2016 at 1:09 PM Paul Michali > wrote:

Great info Chris and thanks for confirming the assignment of blocks
of pages to a numa node.

I'm still struggling with why each VM is being assigned to NUMA node
0. Any ideas on where I should look to see why Nova is not using
NUMA id 1?

Thanks!


PCM


On Tue, Jun 14, 2016 at 10:29 AM Chris Friesen
>
wrote:

On 06/13/2016 02:17 PM, Paul Michali wrote:
 > Hmm... I tried Friday and again today, and I'm not seeing the
VMs being evenly
 > created on the NUMA nodes. Every Cirros VM is created on
nodeid 0.
 >
 > I have the m1/small flavor (@GB) selected and am using
hw:numa_nodes=1 and
 > hw:mem_page_size=2048 flavor-key settings. Each VM is
consuming 1024 huge pages
 > (of size 2MB), but is on nodeid 0 always. Also, it seems that
when I reach 1/2
 > of the total number of huge pages used, libvirt gives an
error saying there is
 > not enough memory to create the VM. Is it expected that the
huge pages are
 > "allocated" to each NUMA node?

Yes, any given memory page exists on one NUMA node, and a
single-NUMA-node VM
will be constrained to a single host NUMA node and will use
memory from that
host NUMA node.

You can see and/or adjust how many hugepages are available on
each NUMA node via
/sys/devices/system/node/nodeX/hugepages/hugepages-2048kB/*
where X is the host
NUMA node number.

Chris



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-14 Thread Paul Michali
Well, looks like we figured out what is going on - maybe folks have some
ideas on how we could handle this issue.

What I see is that for each VM create (small flavor), 1024 huge pages are
used and NUMA node 0 used. It appears that, when there is no longer enough
huge pages on that NUMA node, Nova with then schedule to the other NUMA
node and use those huge pages.

In our case, we happen to have a special container running on the compute
nodes, that uses 512 huge pages. As a result, when there are 768 huge pages
left, Nova thinks there are 1280 pages left and thinks one more VM can be
create. It tries, but the create fails.

Some questions...

1) Is there some way to "reserve" huge pages in Nova?
2) If the create fails, should Nova try the other NUMA node (or is this
because it doesn't know why it failed)?
3) Any ideas on how we can deal with this - without changing Nova?

Thanks!

PCM



On Tue, Jun 14, 2016 at 1:09 PM Paul Michali  wrote:

> Great info Chris and thanks for confirming the assignment of blocks of
> pages to a numa node.
>
> I'm still struggling with why each VM is being assigned to NUMA node 0.
> Any ideas on where I should look to see why Nova is not using NUMA id 1?
>
> Thanks!
>
>
> PCM
>
>
> On Tue, Jun 14, 2016 at 10:29 AM Chris Friesen <
> chris.frie...@windriver.com> wrote:
>
>> On 06/13/2016 02:17 PM, Paul Michali wrote:
>> > Hmm... I tried Friday and again today, and I'm not seeing the VMs being
>> evenly
>> > created on the NUMA nodes. Every Cirros VM is created on nodeid 0.
>> >
>> > I have the m1/small flavor (@GB) selected and am using hw:numa_nodes=1
>> and
>> > hw:mem_page_size=2048 flavor-key settings. Each VM is consuming 1024
>> huge pages
>> > (of size 2MB), but is on nodeid 0 always. Also, it seems that when I
>> reach 1/2
>> > of the total number of huge pages used, libvirt gives an error saying
>> there is
>> > not enough memory to create the VM. Is it expected that the huge pages
>> are
>> > "allocated" to each NUMA node?
>>
>> Yes, any given memory page exists on one NUMA node, and a
>> single-NUMA-node VM
>> will be constrained to a single host NUMA node and will use memory from
>> that
>> host NUMA node.
>>
>> You can see and/or adjust how many hugepages are available on each NUMA
>> node via
>> /sys/devices/system/node/nodeX/hugepages/hugepages-2048kB/* where X is
>> the host
>> NUMA node number.
>>
>> Chris
>>
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-14 Thread Paul Michali
Great info Chris and thanks for confirming the assignment of blocks of
pages to a numa node.

I'm still struggling with why each VM is being assigned to NUMA node 0. Any
ideas on where I should look to see why Nova is not using NUMA id 1?

Thanks!


PCM


On Tue, Jun 14, 2016 at 10:29 AM Chris Friesen 
wrote:

> On 06/13/2016 02:17 PM, Paul Michali wrote:
> > Hmm... I tried Friday and again today, and I'm not seeing the VMs being
> evenly
> > created on the NUMA nodes. Every Cirros VM is created on nodeid 0.
> >
> > I have the m1/small flavor (@GB) selected and am using hw:numa_nodes=1
> and
> > hw:mem_page_size=2048 flavor-key settings. Each VM is consuming 1024
> huge pages
> > (of size 2MB), but is on nodeid 0 always. Also, it seems that when I
> reach 1/2
> > of the total number of huge pages used, libvirt gives an error saying
> there is
> > not enough memory to create the VM. Is it expected that the huge pages
> are
> > "allocated" to each NUMA node?
>
> Yes, any given memory page exists on one NUMA node, and a single-NUMA-node
> VM
> will be constrained to a single host NUMA node and will use memory from
> that
> host NUMA node.
>
> You can see and/or adjust how many hugepages are available on each NUMA
> node via
> /sys/devices/system/node/nodeX/hugepages/hugepages-2048kB/* where X is the
> host
> NUMA node number.
>
> Chris
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-14 Thread Chris Friesen

On 06/13/2016 02:17 PM, Paul Michali wrote:

Hmm... I tried Friday and again today, and I'm not seeing the VMs being evenly
created on the NUMA nodes. Every Cirros VM is created on nodeid 0.

I have the m1/small flavor (@GB) selected and am using hw:numa_nodes=1 and
hw:mem_page_size=2048 flavor-key settings. Each VM is consuming 1024 huge pages
(of size 2MB), but is on nodeid 0 always. Also, it seems that when I reach 1/2
of the total number of huge pages used, libvirt gives an error saying there is
not enough memory to create the VM. Is it expected that the huge pages are
"allocated" to each NUMA node?


Yes, any given memory page exists on one NUMA node, and a single-NUMA-node VM 
will be constrained to a single host NUMA node and will use memory from that 
host NUMA node.


You can see and/or adjust how many hugepages are available on each NUMA node via 
/sys/devices/system/node/nodeX/hugepages/hugepages-2048kB/* where X is the host 
NUMA node number.


Chris


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-13 Thread Paul Michali
Hmm... I tried Friday and again today, and I'm not seeing the VMs being
evenly created on the NUMA nodes. Every Cirros VM is created on nodeid 0.

I have the m1/small flavor (@GB) selected and am using hw:numa_nodes=1 and
hw:mem_page_size=2048 flavor-key settings. Each VM is consuming 1024 huge
pages (of size 2MB), but is on nodeid 0 always. Also, it seems that when I
reach 1/2 of the total number of huge pages used, libvirt gives an error
saying there is not enough memory to create the VM. Is it expected that the
huge pages are "allocated" to each NUMA node?

I don't know why I cannot repeat what I did on 6/3, where I changed
 hw:mem_page_size from "large" to "2048"and it worked, allocation to each
of the two NUMA nodes. :(

Regards,

PCM


On Fri, Jun 10, 2016 at 9:16 AM Paul Michali  wrote:

> Actually, I had menm_page_size set to "large" and not "1024". However, it
> seemed like it was using 1024 pages per (small VM creation). Is there
> possibly some issue with large not using one of the supported values? I
> would have guessed it would have chosen 2M or 1G for the size.
>
> Any thoughts?
>
> PCM
>
> On Fri, Jun 10, 2016 at 9:05 AM Paul Michali  wrote:
>
>> Thanks Daniel and Chris! I think that was the problem, I had configured
>> Nova flavor with a mem_page_size of 1024, and it should have been one of
>> the supported values.
>>
>> I'll go through and check things out one more time, but I think that is
>> the problem. I still need to figure out what is going on with the neutron
>> port not being released - we have another person in my group who has seen
>> the same issue.
>>
>> Regards,
>>
>> PCM
>>
>> On Fri, Jun 10, 2016 at 4:41 AM Daniel P. Berrange 
>> wrote:
>>
>>> On Thu, Jun 09, 2016 at 12:35:06PM -0600, Chris Friesen wrote:
>>> > On 06/09/2016 05:15 AM, Paul Michali wrote:
>>> > > 1) On the host, I was seeing 32768 huge pages, of 2MB size.
>>> >
>>> > Please check the number of huge pages _per host numa node_.
>>> >
>>> > > 2) I changed mem_page_size from 1024 to 2048 in the flavor, and then
>>> when VMs
>>> > > were created, they were being evenly assigned to the two NUMA nodes.
>>> Each using
>>> > > 1024 huge pages. At this point I could create more than half, but
>>> when there
>>> > > were 1945 pages left, it failed to create a VM. Did it fail because
>>> the
>>> > > mem_page_size was 2048 and the available pages were 1945, even
>>> though we were
>>> > > only requesting 1024 pages?
>>> >
>>> > I do not think that "1024" is a valid page size (at least for x86).
>>>
>>> Correct, 4k, 2M and 1GB are valid page sizes.
>>>
>>> > Valid mem_page_size values are determined by the host CPU.  You do not
>>> need
>>> > a larger page size for flavors with larger memory sizes.
>>>
>>> Though note that page sizes should be a multiple of favour mem size
>>> unless you want to waste memory. eg if you have a flavour with 750MB
>>> RAM, then you probably don't want to use 1GB pages as it waste 250MB
>>>
>>> Regards,
>>> Daniel
>>> --
>>> |: http://berrange.com  -o-
>>> http://www.flickr.com/photos/dberrange/ :|
>>> |: http://libvirt.org  -o-
>>> http://virt-manager.org :|
>>> |: http://autobuild.org   -o-
>>> http://search.cpan.org/~danberr/ :|
>>> |: http://entangle-photo.org   -o-
>>> http://live.gnome.org/gtk-vnc :|
>>>
>>>
>>> __
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-10 Thread Paul Michali
Actually, I had menm_page_size set to "large" and not "1024". However, it
seemed like it was using 1024 pages per (small VM creation). Is there
possibly some issue with large not using one of the supported values? I
would have guessed it would have chosen 2M or 1G for the size.

Any thoughts?

PCM

On Fri, Jun 10, 2016 at 9:05 AM Paul Michali  wrote:

> Thanks Daniel and Chris! I think that was the problem, I had configured
> Nova flavor with a mem_page_size of 1024, and it should have been one of
> the supported values.
>
> I'll go through and check things out one more time, but I think that is
> the problem. I still need to figure out what is going on with the neutron
> port not being released - we have another person in my group who has seen
> the same issue.
>
> Regards,
>
> PCM
>
> On Fri, Jun 10, 2016 at 4:41 AM Daniel P. Berrange 
> wrote:
>
>> On Thu, Jun 09, 2016 at 12:35:06PM -0600, Chris Friesen wrote:
>> > On 06/09/2016 05:15 AM, Paul Michali wrote:
>> > > 1) On the host, I was seeing 32768 huge pages, of 2MB size.
>> >
>> > Please check the number of huge pages _per host numa node_.
>> >
>> > > 2) I changed mem_page_size from 1024 to 2048 in the flavor, and then
>> when VMs
>> > > were created, they were being evenly assigned to the two NUMA nodes.
>> Each using
>> > > 1024 huge pages. At this point I could create more than half, but
>> when there
>> > > were 1945 pages left, it failed to create a VM. Did it fail because
>> the
>> > > mem_page_size was 2048 and the available pages were 1945, even though
>> we were
>> > > only requesting 1024 pages?
>> >
>> > I do not think that "1024" is a valid page size (at least for x86).
>>
>> Correct, 4k, 2M and 1GB are valid page sizes.
>>
>> > Valid mem_page_size values are determined by the host CPU.  You do not
>> need
>> > a larger page size for flavors with larger memory sizes.
>>
>> Though note that page sizes should be a multiple of favour mem size
>> unless you want to waste memory. eg if you have a flavour with 750MB
>> RAM, then you probably don't want to use 1GB pages as it waste 250MB
>>
>> Regards,
>> Daniel
>> --
>> |: http://berrange.com  -o-
>> http://www.flickr.com/photos/dberrange/ :|
>> |: http://libvirt.org  -o-
>> http://virt-manager.org :|
>> |: http://autobuild.org   -o-
>> http://search.cpan.org/~danberr/ :|
>> |: http://entangle-photo.org   -o-
>> http://live.gnome.org/gtk-vnc :|
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-10 Thread Paul Michali
Thanks Daniel and Chris! I think that was the problem, I had configured
Nova flavor with a mem_page_size of 1024, and it should have been one of
the supported values.

I'll go through and check things out one more time, but I think that is the
problem. I still need to figure out what is going on with the neutron port
not being released - we have another person in my group who has seen the
same issue.

Regards,

PCM

On Fri, Jun 10, 2016 at 4:41 AM Daniel P. Berrange 
wrote:

> On Thu, Jun 09, 2016 at 12:35:06PM -0600, Chris Friesen wrote:
> > On 06/09/2016 05:15 AM, Paul Michali wrote:
> > > 1) On the host, I was seeing 32768 huge pages, of 2MB size.
> >
> > Please check the number of huge pages _per host numa node_.
> >
> > > 2) I changed mem_page_size from 1024 to 2048 in the flavor, and then
> when VMs
> > > were created, they were being evenly assigned to the two NUMA nodes.
> Each using
> > > 1024 huge pages. At this point I could create more than half, but when
> there
> > > were 1945 pages left, it failed to create a VM. Did it fail because the
> > > mem_page_size was 2048 and the available pages were 1945, even though
> we were
> > > only requesting 1024 pages?
> >
> > I do not think that "1024" is a valid page size (at least for x86).
>
> Correct, 4k, 2M and 1GB are valid page sizes.
>
> > Valid mem_page_size values are determined by the host CPU.  You do not
> need
> > a larger page size for flavors with larger memory sizes.
>
> Though note that page sizes should be a multiple of favour mem size
> unless you want to waste memory. eg if you have a flavour with 750MB
> RAM, then you probably don't want to use 1GB pages as it waste 250MB
>
> Regards,
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/
> :|
> |: http://libvirt.org  -o- http://virt-manager.org
> :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/
> :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc
> :|
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-10 Thread Paul Michali
See PCM: Inline...


On Thu, Jun 9, 2016 at 11:42 AM Steve Gordon <sgor...@redhat.com> wrote:

> - Original Message -
> > From: "Paul Michali" <p...@michali.net>
> > To: "OpenStack Development Mailing List (not for usage questions)" <
> openstack-dev@lists.openstack.org>
> > Sent: Tuesday, June 7, 2016 11:00:30 AM
> > Subject: Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling
> >
> > Anyone have any thoughts on the two questions below? Namely...
> >
> > If the huge pages are 2M, we are creating a 2GB VM, have 1945 huge pages,
> > should the allocation fail (and if so why)?
>
> Were enough pages (1024) available in a single NUMA node? Which release
> are you using? There was a bug where node 0 would always be picked (and
> eventually exhausted) but that was - theoretically - fixed under
> https://bugs.launchpad.net/nova/+bug/1386236


PCM: This is on LIberty, so it sounds like the bugfix was in there.  It's
possible that there was not 1024 left, on a single NUMA node.

Regards,

PCM


>
>
> > Why do all the 2GB VMs get created on the same NUMA node, instead of
> > getting evenly assigned to each of the two NUMA nodes that are available
> on
> > the compute node (as a result, allocation fails, when 1/2 the huge pages
> > are used)? I found that increasing mem_page_size to 2048 resolves the
> > issue, but don't know why.
>
> What was the mem_page_size before it was 2048? I didn't think any smaller
> value was supported.
>
> > ANother thing I was seeing, when the VM create failed due to not enough
> > huge pages available and was in error state, I could delete the VM, but
> the
> > Neutron port was still there.  Is that correct?
> >
> > I didn't see any log messages in neutron, requesting to unbind and delete
> > the port.
> >
> > Thanks!
> >
> > PCM
> >
> > .
> >
> > On Fri, Jun 3, 2016 at 2:03 PM Paul Michali <p...@michali.net> wrote:
> >
> > > Thanks for the link Tim!
> > >
> > > Right now, I have two things I'm unsure about...
> > >
> > > One is that I had 1945 huge pages left (of size 2048k) and tried to
> create
> > > a VM with a small flavor (2GB), which should need 1024 pages, but Nova
> > > indicated that it wasn't able to find a host (and QEMU reported an
> > > allocation issue).
> > >
> > > The other is that VMs are not being evenly distributed on my two NUMA
> > > nodes, and instead, are getting created all on one NUMA node. Not sure
> if
> > > that is expected (and setting mem_page_size to 2048 is the proper way).
> > >
> > > Regards,
> > >
> > > PCM
> > >
> > >
> > > On Fri, Jun 3, 2016 at 1:21 PM Tim Bell <tim.b...@cern.ch> wrote:
> > >
> > >> The documentation at
> > >> http://docs.openstack.org/admin-guide/compute-flavors.html is
> gradually
> > >> improving. Are there areas which were not covered in your
> clarifications ?
> > >> If so, we should fix the documentation too since this is a complex
> area to
> > >> configure and good documentation is a great help.
> > >>
> > >>
> > >>
> > >> BTW, there is also an issue around how the RAM for the BIOS is
> shadowed.
> > >> I can’t find the page from a quick google but we found an imbalance
> when
> > >> we
> > >> used 2GB pages as the RAM for BIOS shadowing was done by default in
> the
> > >> memory space for only one of the NUMA spaces.
> > >>
> > >>
> > >>
> > >> Having a look at the KVM XML can also help a bit if you are debugging.
> > >>
> > >>
> > >>
> > >> Tim
> > >>
> > >>
> > >>
> > >> *From: *Paul Michali <p...@michali.net>
> > >> *Reply-To: *"OpenStack Development Mailing List (not for usage
> > >> questions)" <openstack-dev@lists.openstack.org>
> > >> *Date: *Friday 3 June 2016 at 15:18
> > >> *To: *"Daniel P. Berrange" <berra...@redhat.com>, "OpenStack
> Development
> > >> Mailing List (not for usage questions)" <
> > >> openstack-dev@lists.openstack.org>
> > >> *Subject: *Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling
> > >>
> > >>
> > >>
> > >> See PCM inline...
> > >>
> > >> On Fri, Jun 3, 2016 at 8:44 AM Daniel P. Berrange &

Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-10 Thread Paul Michali
I'll try to reproduce and collect logs for a bug report.

Thanks for the info.

PCM


On Thu, Jun 9, 2016 at 9:43 AM Matt Riedemann 
wrote:

> On 6/9/2016 6:15 AM, Paul Michali wrote:
> >
> >
> > On Wed, Jun 8, 2016 at 11:21 PM Chris Friesen
> > >
> wrote:
> >
> > On 06/03/2016 12:03 PM, Paul Michali wrote:
> > > Thanks for the link Tim!
> > >
> > > Right now, I have two things I'm unsure about...
> > >
> > > One is that I had 1945 huge pages left (of size 2048k) and tried
> > to create a VM
> > > with a small flavor (2GB), which should need 1024 pages, but Nova
> > indicated that
> > > it wasn't able to find a host (and QEMU reported an allocation
> issue).
> > >
> > > The other is that VMs are not being evenly distributed on my two
> > NUMA nodes, and
> > > instead, are getting created all on one NUMA node. Not sure if
> > that is expected
> > > (and setting mem_page_size to 2048 is the proper way).
> >
> >
> > Just in case you haven't figured out the problem...
> >
> > Have you checked the per-host-numa-node 2MB huge page availability
> > on your host?
> >   If it's uneven then that might explain what you're seeing.
> >
> >
> > These are the observations/questions I have:
> >
> > 1) On the host, I was seeing 32768 huge pages, of 2MB size. When I
> > created VMs (Cirros) using small flavor, each VM was getting created on
> > NUMA nodeid 0. When it hit half of the available pages, I could no
> > longer create any VMs (QEMU saying no space). I'd like to understand why
> > the assignment was always going two nodeid 0, and to confirm that the
> > huge pages are divided among the number of NUMA nodes available.
> >
> > 2) I changed mem_page_size from 1024 to 2048 in the flavor, and then
> > when VMs were created, they were being evenly assigned to the two NUMA
> > nodes. Each using 1024 huge pages. At this point I could create more
> > than half, but when there were 1945 pages left, it failed to create a
> > VM. Did it fail because the mem_page_size was 2048 and the available
> > pages were 1945, even though we were only requesting 1024 pages?
> >
> > 3) Related to #2, is there a relationship between mem_page_size, the
> > allocation of VMs to NUMA nodes, and the flavor size? IOW, if I use the
> > medium flavor (4GB), will I need a larger mem_page_size? (I'll play with
> > this variation, as soon as I can). Gets back to understanding how the
> > scheduling determines how to assign the VMs.
> >
> > 4) When the VM create failed due to QEMU failing allocation, the VM went
> > to error state. I deleted the VM, but the neutron port was still there,
> > and there were no log messages indicating that a request was made to
> > delete the port. Is this expected (that the user would have to manually
> > clean up the port)?
>
> When you hit this case, can you check if instance.host is set in the
> database before deleting the instance? I'm guessing what's happening is
> the instance didn't get assigned a host since it eventually ended up
> with NoValidHost, so when you go to delete it doesn't have a compute to
> send it to for delete, so it deletes from the compute API, and we don't
> have the host binding details to delete the port.
>
> Although, when the spawn failed in the compute to begin with we should
> have deallocated any networking that was created before kicking back to
> the scheduler - unless we don't go back to the scheduler if the instance
> is set to ERROR state.
>
> A bug report with stacktrace of the failure scenario when the instance
> goes to error state bug n-cpu logs would probably help.
>
> >
> > 5) A coworker had hit the problem mentioned in #1, with exhaustion at
> > the halfway point. If she delete's a VM, and then changes the flavor to
> > change the mem_page_size to 2048, should Nova start assigning all new
> > VMs to the other NUMA node, until the pool of huge pages is down to
> > where the huge pages are for NUMA node 0, or will it alternate between
> > the available NUMA nodes (and run out when node 0's pool is exhausted)?
> >
> > Thanks in advance!
> >
> > PCM
> >
> >
> >
> >
> > Chris
> >
> >
>  __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > <
> http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> >
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
>
> --
>
> Thanks,
>
> 

Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-10 Thread Daniel P. Berrange
On Thu, Jun 09, 2016 at 12:35:06PM -0600, Chris Friesen wrote:
> On 06/09/2016 05:15 AM, Paul Michali wrote:
> > 1) On the host, I was seeing 32768 huge pages, of 2MB size.
> 
> Please check the number of huge pages _per host numa node_.
> 
> > 2) I changed mem_page_size from 1024 to 2048 in the flavor, and then when 
> > VMs
> > were created, they were being evenly assigned to the two NUMA nodes. Each 
> > using
> > 1024 huge pages. At this point I could create more than half, but when there
> > were 1945 pages left, it failed to create a VM. Did it fail because the
> > mem_page_size was 2048 and the available pages were 1945, even though we 
> > were
> > only requesting 1024 pages?
> 
> I do not think that "1024" is a valid page size (at least for x86).

Correct, 4k, 2M and 1GB are valid page sizes.

> Valid mem_page_size values are determined by the host CPU.  You do not need
> a larger page size for flavors with larger memory sizes.

Though note that page sizes should be a multiple of favour mem size
unless you want to waste memory. eg if you have a flavour with 750MB
RAM, then you probably don't want to use 1GB pages as it waste 250MB

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-09 Thread Chris Friesen

On 06/09/2016 05:15 AM, Paul Michali wrote:

1) On the host, I was seeing 32768 huge pages, of 2MB size.


Please check the number of huge pages _per host numa node_.


2) I changed mem_page_size from 1024 to 2048 in the flavor, and then when VMs
were created, they were being evenly assigned to the two NUMA nodes. Each using
1024 huge pages. At this point I could create more than half, but when there
were 1945 pages left, it failed to create a VM. Did it fail because the
mem_page_size was 2048 and the available pages were 1945, even though we were
only requesting 1024 pages?


I do not think that "1024" is a valid page size (at least for x86).

Be careful about units. mem_page_size is in units of KB.  For x86, valid 
numerical sizes are 4, 2048, and 1048576.  (For 4KB, 2MB, and 1GB hugepages.) 
The flavor specifies memory size in MB.



3) Related to #2, is there a relationship between mem_page_size, the allocation
of VMs to NUMA nodes, and the flavor size? IOW, if I use the medium flavor
(4GB), will I need a larger mem_page_size? (I'll play with this variation, as
soon as I can). Gets back to understanding how the scheduling determines how to
assign the VMs.


Valid mem_page_size values are determined by the host CPU.  You do not need a 
larger page size for flavors with larger memory sizes.


VMs with numa topology (hugepages, pinned CPUs, pci devices, etc.) will be 
pinned to a single host numa node.)



Chris


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-09 Thread Steve Gordon
- Original Message -
> From: "Paul Michali" <p...@michali.net>
> To: "OpenStack Development Mailing List (not for usage questions)" 
> <openstack-dev@lists.openstack.org>
> Sent: Tuesday, June 7, 2016 11:00:30 AM
> Subject: Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling
> 
> Anyone have any thoughts on the two questions below? Namely...
> 
> If the huge pages are 2M, we are creating a 2GB VM, have 1945 huge pages,
> should the allocation fail (and if so why)?

Were enough pages (1024) available in a single NUMA node? Which release are you 
using? There was a bug where node 0 would always be picked (and eventually 
exhausted) but that was - theoretically - fixed under 
https://bugs.launchpad.net/nova/+bug/1386236

> Why do all the 2GB VMs get created on the same NUMA node, instead of
> getting evenly assigned to each of the two NUMA nodes that are available on
> the compute node (as a result, allocation fails, when 1/2 the huge pages
> are used)? I found that increasing mem_page_size to 2048 resolves the
> issue, but don't know why.

What was the mem_page_size before it was 2048? I didn't think any smaller value 
was supported.

> ANother thing I was seeing, when the VM create failed due to not enough
> huge pages available and was in error state, I could delete the VM, but the
> Neutron port was still there.  Is that correct?
> 
> I didn't see any log messages in neutron, requesting to unbind and delete
> the port.
> 
> Thanks!
> 
> PCM
> 
> .
> 
> On Fri, Jun 3, 2016 at 2:03 PM Paul Michali <p...@michali.net> wrote:
> 
> > Thanks for the link Tim!
> >
> > Right now, I have two things I'm unsure about...
> >
> > One is that I had 1945 huge pages left (of size 2048k) and tried to create
> > a VM with a small flavor (2GB), which should need 1024 pages, but Nova
> > indicated that it wasn't able to find a host (and QEMU reported an
> > allocation issue).
> >
> > The other is that VMs are not being evenly distributed on my two NUMA
> > nodes, and instead, are getting created all on one NUMA node. Not sure if
> > that is expected (and setting mem_page_size to 2048 is the proper way).
> >
> > Regards,
> >
> > PCM
> >
> >
> > On Fri, Jun 3, 2016 at 1:21 PM Tim Bell <tim.b...@cern.ch> wrote:
> >
> >> The documentation at
> >> http://docs.openstack.org/admin-guide/compute-flavors.html is gradually
> >> improving. Are there areas which were not covered in your clarifications ?
> >> If so, we should fix the documentation too since this is a complex area to
> >> configure and good documentation is a great help.
> >>
> >>
> >>
> >> BTW, there is also an issue around how the RAM for the BIOS is shadowed.
> >> I can’t find the page from a quick google but we found an imbalance when
> >> we
> >> used 2GB pages as the RAM for BIOS shadowing was done by default in the
> >> memory space for only one of the NUMA spaces.
> >>
> >>
> >>
> >> Having a look at the KVM XML can also help a bit if you are debugging.
> >>
> >>
> >>
> >> Tim
> >>
> >>
> >>
> >> *From: *Paul Michali <p...@michali.net>
> >> *Reply-To: *"OpenStack Development Mailing List (not for usage
> >> questions)" <openstack-dev@lists.openstack.org>
> >> *Date: *Friday 3 June 2016 at 15:18
> >> *To: *"Daniel P. Berrange" <berra...@redhat.com>, "OpenStack Development
> >> Mailing List (not for usage questions)" <
> >> openstack-dev@lists.openstack.org>
> >> *Subject: *Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling
> >>
> >>
> >>
> >> See PCM inline...
> >>
> >> On Fri, Jun 3, 2016 at 8:44 AM Daniel P. Berrange <berra...@redhat.com>
> >> wrote:
> >>
> >> On Fri, Jun 03, 2016 at 12:32:17PM +, Paul Michali wrote:
> >> > Hi!
> >> >
> >> > I've been playing with Liberty code a bit and had some questions that
> >> I'm
> >> > hoping Nova folks may be able to provide guidance on...
> >> >
> >> > If I set up a flavor with hw:mem_page_size=2048, and I'm creating
> >> (Cirros)
> >> > VMs with size 1024, will the scheduling use the minimum of the number of
> >>
> >> 1024 what units ? 1024 MB, or 1024 huge pages aka 2048 MB ?
> >>
> >>
> >>
> >> PCM: I was using small flavor, which is 2 GB. So that's 2048 MB and the
&

Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-09 Thread Matt Riedemann

On 6/9/2016 6:15 AM, Paul Michali wrote:



On Wed, Jun 8, 2016 at 11:21 PM Chris Friesen
> wrote:

On 06/03/2016 12:03 PM, Paul Michali wrote:
> Thanks for the link Tim!
>
> Right now, I have two things I'm unsure about...
>
> One is that I had 1945 huge pages left (of size 2048k) and tried
to create a VM
> with a small flavor (2GB), which should need 1024 pages, but Nova
indicated that
> it wasn't able to find a host (and QEMU reported an allocation issue).
>
> The other is that VMs are not being evenly distributed on my two
NUMA nodes, and
> instead, are getting created all on one NUMA node. Not sure if
that is expected
> (and setting mem_page_size to 2048 is the proper way).


Just in case you haven't figured out the problem...

Have you checked the per-host-numa-node 2MB huge page availability
on your host?
  If it's uneven then that might explain what you're seeing.


These are the observations/questions I have:

1) On the host, I was seeing 32768 huge pages, of 2MB size. When I
created VMs (Cirros) using small flavor, each VM was getting created on
NUMA nodeid 0. When it hit half of the available pages, I could no
longer create any VMs (QEMU saying no space). I'd like to understand why
the assignment was always going two nodeid 0, and to confirm that the
huge pages are divided among the number of NUMA nodes available.

2) I changed mem_page_size from 1024 to 2048 in the flavor, and then
when VMs were created, they were being evenly assigned to the two NUMA
nodes. Each using 1024 huge pages. At this point I could create more
than half, but when there were 1945 pages left, it failed to create a
VM. Did it fail because the mem_page_size was 2048 and the available
pages were 1945, even though we were only requesting 1024 pages?

3) Related to #2, is there a relationship between mem_page_size, the
allocation of VMs to NUMA nodes, and the flavor size? IOW, if I use the
medium flavor (4GB), will I need a larger mem_page_size? (I'll play with
this variation, as soon as I can). Gets back to understanding how the
scheduling determines how to assign the VMs.

4) When the VM create failed due to QEMU failing allocation, the VM went
to error state. I deleted the VM, but the neutron port was still there,
and there were no log messages indicating that a request was made to
delete the port. Is this expected (that the user would have to manually
clean up the port)?


When you hit this case, can you check if instance.host is set in the 
database before deleting the instance? I'm guessing what's happening is 
the instance didn't get assigned a host since it eventually ended up 
with NoValidHost, so when you go to delete it doesn't have a compute to 
send it to for delete, so it deletes from the compute API, and we don't 
have the host binding details to delete the port.


Although, when the spawn failed in the compute to begin with we should 
have deallocated any networking that was created before kicking back to 
the scheduler - unless we don't go back to the scheduler if the instance 
is set to ERROR state.


A bug report with stacktrace of the failure scenario when the instance 
goes to error state bug n-cpu logs would probably help.




5) A coworker had hit the problem mentioned in #1, with exhaustion at
the halfway point. If she delete's a VM, and then changes the flavor to
change the mem_page_size to 2048, should Nova start assigning all new
VMs to the other NUMA node, until the pool of huge pages is down to
where the huge pages are for NUMA node 0, or will it alternate between
the available NUMA nodes (and run out when node 0's pool is exhausted)?

Thanks in advance!

PCM




Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-09 Thread Paul Michali
On Wed, Jun 8, 2016 at 11:21 PM Chris Friesen 
wrote:

> On 06/03/2016 12:03 PM, Paul Michali wrote:
> > Thanks for the link Tim!
> >
> > Right now, I have two things I'm unsure about...
> >
> > One is that I had 1945 huge pages left (of size 2048k) and tried to
> create a VM
> > with a small flavor (2GB), which should need 1024 pages, but Nova
> indicated that
> > it wasn't able to find a host (and QEMU reported an allocation issue).
> >
> > The other is that VMs are not being evenly distributed on my two NUMA
> nodes, and
> > instead, are getting created all on one NUMA node. Not sure if that is
> expected
> > (and setting mem_page_size to 2048 is the proper way).
>
>
> Just in case you haven't figured out the problem...
>
> Have you checked the per-host-numa-node 2MB huge page availability on your
> host?
>   If it's uneven then that might explain what you're seeing.
>

These are the observations/questions I have:

1) On the host, I was seeing 32768 huge pages, of 2MB size. When I created
VMs (Cirros) using small flavor, each VM was getting created on NUMA nodeid
0. When it hit half of the available pages, I could no longer create any
VMs (QEMU saying no space). I'd like to understand why the assignment was
always going two nodeid 0, and to confirm that the huge pages are divided
among the number of NUMA nodes available.

2) I changed mem_page_size from 1024 to 2048 in the flavor, and then when
VMs were created, they were being evenly assigned to the two NUMA nodes.
Each using 1024 huge pages. At this point I could create more than half,
but when there were 1945 pages left, it failed to create a VM. Did it fail
because the mem_page_size was 2048 and the available pages were 1945, even
though we were only requesting 1024 pages?

3) Related to #2, is there a relationship between mem_page_size, the
allocation of VMs to NUMA nodes, and the flavor size? IOW, if I use the
medium flavor (4GB), will I need a larger mem_page_size? (I'll play with
this variation, as soon as I can). Gets back to understanding how the
scheduling determines how to assign the VMs.

4) When the VM create failed due to QEMU failing allocation, the VM went to
error state. I deleted the VM, but the neutron port was still there, and
there were no log messages indicating that a request was made to delete the
port. Is this expected (that the user would have to manually clean up the
port)?

5) A coworker had hit the problem mentioned in #1, with exhaustion at the
halfway point. If she delete's a VM, and then changes the flavor to change
the mem_page_size to 2048, should Nova start assigning all new VMs to the
other NUMA node, until the pool of huge pages is down to where the huge
pages are for NUMA node 0, or will it alternate between the available NUMA
nodes (and run out when node 0's pool is exhausted)?

Thanks in advance!

PCM




> Chris
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-08 Thread Chris Friesen

On 06/03/2016 12:03 PM, Paul Michali wrote:

Thanks for the link Tim!

Right now, I have two things I'm unsure about...

One is that I had 1945 huge pages left (of size 2048k) and tried to create a VM
with a small flavor (2GB), which should need 1024 pages, but Nova indicated that
it wasn't able to find a host (and QEMU reported an allocation issue).

The other is that VMs are not being evenly distributed on my two NUMA nodes, and
instead, are getting created all on one NUMA node. Not sure if that is expected
(and setting mem_page_size to 2048 is the proper way).



Just in case you haven't figured out the problem...

Have you checked the per-host-numa-node 2MB huge page availability on your host? 
 If it's uneven then that might explain what you're seeing.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-07 Thread Paul Michali
Anyone have any thoughts on the two questions below? Namely...

If the huge pages are 2M, we are creating a 2GB VM, have 1945 huge pages,
should the allocation fail (and if so why)?

Why do all the 2GB VMs get created on the same NUMA node, instead of
getting evenly assigned to each of the two NUMA nodes that are available on
the compute node (as a result, allocation fails, when 1/2 the huge pages
are used)? I found that increasing mem_page_size to 2048 resolves the
issue, but don't know why.

ANother thing I was seeing, when the VM create failed due to not enough
huge pages available and was in error state, I could delete the VM, but the
Neutron port was still there.  Is that correct?

I didn't see any log messages in neutron, requesting to unbind and delete
the port.

Thanks!

PCM

.

On Fri, Jun 3, 2016 at 2:03 PM Paul Michali <p...@michali.net> wrote:

> Thanks for the link Tim!
>
> Right now, I have two things I'm unsure about...
>
> One is that I had 1945 huge pages left (of size 2048k) and tried to create
> a VM with a small flavor (2GB), which should need 1024 pages, but Nova
> indicated that it wasn't able to find a host (and QEMU reported an
> allocation issue).
>
> The other is that VMs are not being evenly distributed on my two NUMA
> nodes, and instead, are getting created all on one NUMA node. Not sure if
> that is expected (and setting mem_page_size to 2048 is the proper way).
>
> Regards,
>
> PCM
>
>
> On Fri, Jun 3, 2016 at 1:21 PM Tim Bell <tim.b...@cern.ch> wrote:
>
>> The documentation at
>> http://docs.openstack.org/admin-guide/compute-flavors.html is gradually
>> improving. Are there areas which were not covered in your clarifications ?
>> If so, we should fix the documentation too since this is a complex area to
>> configure and good documentation is a great help.
>>
>>
>>
>> BTW, there is also an issue around how the RAM for the BIOS is shadowed.
>> I can’t find the page from a quick google but we found an imbalance when we
>> used 2GB pages as the RAM for BIOS shadowing was done by default in the
>> memory space for only one of the NUMA spaces.
>>
>>
>>
>> Having a look at the KVM XML can also help a bit if you are debugging.
>>
>>
>>
>> Tim
>>
>>
>>
>> *From: *Paul Michali <p...@michali.net>
>> *Reply-To: *"OpenStack Development Mailing List (not for usage
>> questions)" <openstack-dev@lists.openstack.org>
>> *Date: *Friday 3 June 2016 at 15:18
>> *To: *"Daniel P. Berrange" <berra...@redhat.com>, "OpenStack Development
>> Mailing List (not for usage questions)" <
>> openstack-dev@lists.openstack.org>
>> *Subject: *Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling
>>
>>
>>
>> See PCM inline...
>>
>> On Fri, Jun 3, 2016 at 8:44 AM Daniel P. Berrange <berra...@redhat.com>
>> wrote:
>>
>> On Fri, Jun 03, 2016 at 12:32:17PM +, Paul Michali wrote:
>> > Hi!
>> >
>> > I've been playing with Liberty code a bit and had some questions that
>> I'm
>> > hoping Nova folks may be able to provide guidance on...
>> >
>> > If I set up a flavor with hw:mem_page_size=2048, and I'm creating
>> (Cirros)
>> > VMs with size 1024, will the scheduling use the minimum of the number of
>>
>> 1024 what units ? 1024 MB, or 1024 huge pages aka 2048 MB ?
>>
>>
>>
>> PCM: I was using small flavor, which is 2 GB. So that's 2048 MB and the
>> page size is 2048K, so 1024 pages? Hope I have the units right.
>>
>>
>>
>>
>>
>>
>> > huge pages available and the size requested for the VM, or will it base
>> > scheduling only on the number of huge pages?
>> >
>> > It seems to be doing the latter, where I had 1945 huge pages free, and
>> > tried to create another VM (1024) and Nova rejected the request with "no
>> > hosts available".
>>
>> From this I'm guessing you're meaning 1024 huge pages aka 2 GB earlier.
>>
>> Anyway, when you request huge pages to be used for a flavour, the
>> entire guest RAM must be able to be allocated from huge pages.
>> ie if you have a guest with 2 GB of RAM, you must have 2 GB worth
>> of huge pages available. It is not possible for a VM to use
>> 1.5 GB of huge pages and 500 MB of normal sized pages.
>>
>>
>>
>> PCM: Right, so, with 2GB of RAM, I need 1024 huge pages of size 2048K. In
>> this case, there are 1945 huge pages available, so I was wondering why it
>> failed. Maybe I'm confusing sizes/pages?
&g

Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-03 Thread Paul Michali
Thanks for the link Tim!

Right now, I have two things I'm unsure about...

One is that I had 1945 huge pages left (of size 2048k) and tried to create
a VM with a small flavor (2GB), which should need 1024 pages, but Nova
indicated that it wasn't able to find a host (and QEMU reported an
allocation issue).

The other is that VMs are not being evenly distributed on my two NUMA
nodes, and instead, are getting created all on one NUMA node. Not sure if
that is expected (and setting mem_page_size to 2048 is the proper way).

Regards,

PCM


On Fri, Jun 3, 2016 at 1:21 PM Tim Bell <tim.b...@cern.ch> wrote:

> The documentation at
> http://docs.openstack.org/admin-guide/compute-flavors.html is gradually
> improving. Are there areas which were not covered in your clarifications ?
> If so, we should fix the documentation too since this is a complex area to
> configure and good documentation is a great help.
>
>
>
> BTW, there is also an issue around how the RAM for the BIOS is shadowed. I
> can’t find the page from a quick google but we found an imbalance when we
> used 2GB pages as the RAM for BIOS shadowing was done by default in the
> memory space for only one of the NUMA spaces.
>
>
>
> Having a look at the KVM XML can also help a bit if you are debugging.
>
>
>
> Tim
>
>
>
> *From: *Paul Michali <p...@michali.net>
> *Reply-To: *"OpenStack Development Mailing List (not for usage
> questions)" <openstack-dev@lists.openstack.org>
> *Date: *Friday 3 June 2016 at 15:18
> *To: *"Daniel P. Berrange" <berra...@redhat.com>, "OpenStack Development
> Mailing List (not for usage questions)" <openstack-dev@lists.openstack.org
> >
> *Subject: *Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling
>
>
>
> See PCM inline...
>
> On Fri, Jun 3, 2016 at 8:44 AM Daniel P. Berrange <berra...@redhat.com>
> wrote:
>
> On Fri, Jun 03, 2016 at 12:32:17PM +, Paul Michali wrote:
> > Hi!
> >
> > I've been playing with Liberty code a bit and had some questions that I'm
> > hoping Nova folks may be able to provide guidance on...
> >
> > If I set up a flavor with hw:mem_page_size=2048, and I'm creating
> (Cirros)
> > VMs with size 1024, will the scheduling use the minimum of the number of
>
> 1024 what units ? 1024 MB, or 1024 huge pages aka 2048 MB ?
>
>
>
> PCM: I was using small flavor, which is 2 GB. So that's 2048 MB and the
> page size is 2048K, so 1024 pages? Hope I have the units right.
>
>
>
>
>
>
> > huge pages available and the size requested for the VM, or will it base
> > scheduling only on the number of huge pages?
> >
> > It seems to be doing the latter, where I had 1945 huge pages free, and
> > tried to create another VM (1024) and Nova rejected the request with "no
> > hosts available".
>
> From this I'm guessing you're meaning 1024 huge pages aka 2 GB earlier.
>
> Anyway, when you request huge pages to be used for a flavour, the
> entire guest RAM must be able to be allocated from huge pages.
> ie if you have a guest with 2 GB of RAM, you must have 2 GB worth
> of huge pages available. It is not possible for a VM to use
> 1.5 GB of huge pages and 500 MB of normal sized pages.
>
>
>
> PCM: Right, so, with 2GB of RAM, I need 1024 huge pages of size 2048K. In
> this case, there are 1945 huge pages available, so I was wondering why it
> failed. Maybe I'm confusing sizes/pages?
>
>
>
>
>
>
> > Is this still the same for Mitaka?
>
> Yep, this use of huge pages has not changed.
>
> > Where could I look in the code to see how the scheduling is determined?
>
> Most logic related to huge pages is in nova/virt/hardware.py
>
> > If I use mem_page_size=large (what I originally had), should it evenly
> > assign huge pages from the available NUMA nodes (there are two in my
> case)?
> >
> > It looks like it was assigning all VMs to the same NUMA node (0) in this
> > case. Is the right way to change to 2048, like I did above?
>
> Nova will always avoid spreading your VM across 2 host NUMA nodes,
> since that gives bad performance characteristics. IOW, it will always
> allocate huge pages from the NUMA node that the guest will run on. If
> you explicitly want your VM to spread across 2 host NUMA nodes, then
> you must tell nova to create 2 *guest* NUMA nodes for the VM. Nova
> will then place each guest NUMA node, on a separate host NUMA node
> and allocate huge pages from node to match. This is done using
> the hw:numa_nodes=2 parameter on the flavour
>
>
>
> PCM: Gotcha, but that was not the issue I'm seeing. With this small flavor
> (2GB = 1024 pages)

Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-03 Thread Tim Bell
The documentation at http://docs.openstack.org/admin-guide/compute-flavors.html 
is gradually improving. Are there areas which were not covered in your 
clarifications ? If so, we should fix the documentation too since this is a 
complex area to configure and good documentation is a great help.

BTW, there is also an issue around how the RAM for the BIOS is shadowed. I 
can’t find the page from a quick google but we found an imbalance when we used 
2GB pages as the RAM for BIOS shadowing was done by default in the memory space 
for only one of the NUMA spaces.

Having a look at the KVM XML can also help a bit if you are debugging.

Tim

From: Paul Michali <p...@michali.net>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org>
Date: Friday 3 June 2016 at 15:18
To: "Daniel P. Berrange" <berra...@redhat.com>, "OpenStack Development Mailing 
List (not for usage questions)" <openstack-dev@lists.openstack.org>
Subject: Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

See PCM inline...
On Fri, Jun 3, 2016 at 8:44 AM Daniel P. Berrange 
<berra...@redhat.com<mailto:berra...@redhat.com>> wrote:
On Fri, Jun 03, 2016 at 12:32:17PM +, Paul Michali wrote:
> Hi!
>
> I've been playing with Liberty code a bit and had some questions that I'm
> hoping Nova folks may be able to provide guidance on...
>
> If I set up a flavor with hw:mem_page_size=2048, and I'm creating (Cirros)
> VMs with size 1024, will the scheduling use the minimum of the number of

1024 what units ? 1024 MB, or 1024 huge pages aka 2048 MB ?

PCM: I was using small flavor, which is 2 GB. So that's 2048 MB and the page 
size is 2048K, so 1024 pages? Hope I have the units right.



> huge pages available and the size requested for the VM, or will it base
> scheduling only on the number of huge pages?
>
> It seems to be doing the latter, where I had 1945 huge pages free, and
> tried to create another VM (1024) and Nova rejected the request with "no
> hosts available".

From this I'm guessing you're meaning 1024 huge pages aka 2 GB earlier.

Anyway, when you request huge pages to be used for a flavour, the
entire guest RAM must be able to be allocated from huge pages.
ie if you have a guest with 2 GB of RAM, you must have 2 GB worth
of huge pages available. It is not possible for a VM to use
1.5 GB of huge pages and 500 MB of normal sized pages.

PCM: Right, so, with 2GB of RAM, I need 1024 huge pages of size 2048K. In this 
case, there are 1945 huge pages available, so I was wondering why it failed. 
Maybe I'm confusing sizes/pages?



> Is this still the same for Mitaka?

Yep, this use of huge pages has not changed.

> Where could I look in the code to see how the scheduling is determined?

Most logic related to huge pages is in nova/virt/hardware.py

> If I use mem_page_size=large (what I originally had), should it evenly
> assign huge pages from the available NUMA nodes (there are two in my case)?
>
> It looks like it was assigning all VMs to the same NUMA node (0) in this
> case. Is the right way to change to 2048, like I did above?

Nova will always avoid spreading your VM across 2 host NUMA nodes,
since that gives bad performance characteristics. IOW, it will always
allocate huge pages from the NUMA node that the guest will run on. If
you explicitly want your VM to spread across 2 host NUMA nodes, then
you must tell nova to create 2 *guest* NUMA nodes for the VM. Nova
will then place each guest NUMA node, on a separate host NUMA node
and allocate huge pages from node to match. This is done using
the hw:numa_nodes=2 parameter on the flavour

PCM: Gotcha, but that was not the issue I'm seeing. With this small flavor (2GB 
= 1024 pages), I had 13107 huge pages initially. As I created VMs, they were 
*all* placed on the same NUMA node (0). As a result, when I got to more than 
have the available pages, Nova failed to allow further VMs, even though I had 
6963 available on one compute node, and 5939 on another.

It seems that all the assignments were to node zero. Someone suggested to me to 
set mem_page_size to 2048, and at that point it started assigning to both NUMA 
nodes evenly.

Thanks for the help!!!


Regards,

PCM


> Again, has this changed at all in Mitaka?

Nope. Well aside from random bug fixes.

Regards,
Daniel
--
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscr

Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-03 Thread Paul Michali
See PCM inline...

On Fri, Jun 3, 2016 at 8:44 AM Daniel P. Berrange 
wrote:

> On Fri, Jun 03, 2016 at 12:32:17PM +, Paul Michali wrote:
> > Hi!
> >
> > I've been playing with Liberty code a bit and had some questions that I'm
> > hoping Nova folks may be able to provide guidance on...
> >
> > If I set up a flavor with hw:mem_page_size=2048, and I'm creating
> (Cirros)
> > VMs with size 1024, will the scheduling use the minimum of the number of
>
> 1024 what units ? 1024 MB, or 1024 huge pages aka 2048 MB ?
>

PCM: I was using small flavor, which is 2 GB. So that's 2048 MB and the
page size is 2048K, so 1024 pages? Hope I have the units right.



> > huge pages available and the size requested for the VM, or will it base
> > scheduling only on the number of huge pages?
> >
> > It seems to be doing the latter, where I had 1945 huge pages free, and
> > tried to create another VM (1024) and Nova rejected the request with "no
> > hosts available".
>
> From this I'm guessing you're meaning 1024 huge pages aka 2 GB earlier.
>
> Anyway, when you request huge pages to be used for a flavour, the
> entire guest RAM must be able to be allocated from huge pages.
> ie if you have a guest with 2 GB of RAM, you must have 2 GB worth
> of huge pages available. It is not possible for a VM to use
> 1.5 GB of huge pages and 500 MB of normal sized pages.
>

PCM: Right, so, with 2GB of RAM, I need 1024 huge pages of size 2048K. In
this case, there are 1945 huge pages available, so I was wondering why it
failed. Maybe I'm confusing sizes/pages?



>
> > Is this still the same for Mitaka?
>
> Yep, this use of huge pages has not changed.
>
> > Where could I look in the code to see how the scheduling is determined?
>
> Most logic related to huge pages is in nova/virt/hardware.py
>
> > If I use mem_page_size=large (what I originally had), should it evenly
> > assign huge pages from the available NUMA nodes (there are two in my
> case)?
> >
> > It looks like it was assigning all VMs to the same NUMA node (0) in this
> > case. Is the right way to change to 2048, like I did above?
>
> Nova will always avoid spreading your VM across 2 host NUMA nodes,
> since that gives bad performance characteristics. IOW, it will always
> allocate huge pages from the NUMA node that the guest will run on. If
> you explicitly want your VM to spread across 2 host NUMA nodes, then
> you must tell nova to create 2 *guest* NUMA nodes for the VM. Nova
> will then place each guest NUMA node, on a separate host NUMA node
> and allocate huge pages from node to match. This is done using
> the hw:numa_nodes=2 parameter on the flavour
>

PCM: Gotcha, but that was not the issue I'm seeing. With this small flavor
(2GB = 1024 pages), I had 13107 huge pages initially. As I created VMs,
they were *all* placed on the same NUMA node (0). As a result, when I got
to more than have the available pages, Nova failed to allow further VMs,
even though I had 6963 available on one compute node, and 5939 on another.

It seems that all the assignments were to node zero. Someone suggested to
me to set mem_page_size to 2048, and at that point it started assigning to
both NUMA nodes evenly.

Thanks for the help!!!


Regards,

PCM


>
> > Again, has this changed at all in Mitaka?
>
> Nope. Well aside from random bug fixes.
>
> Regards,
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/
> :|
> |: http://libvirt.org  -o- http://virt-manager.org
> :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/
> :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc
> :|
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-03 Thread Daniel P. Berrange
On Fri, Jun 03, 2016 at 12:32:17PM +, Paul Michali wrote:
> Hi!
> 
> I've been playing with Liberty code a bit and had some questions that I'm
> hoping Nova folks may be able to provide guidance on...
> 
> If I set up a flavor with hw:mem_page_size=2048, and I'm creating (Cirros)
> VMs with size 1024, will the scheduling use the minimum of the number of

1024 what units ? 1024 MB, or 1024 huge pages aka 2048 MB ?

> huge pages available and the size requested for the VM, or will it base
> scheduling only on the number of huge pages?
> 
> It seems to be doing the latter, where I had 1945 huge pages free, and
> tried to create another VM (1024) and Nova rejected the request with "no
> hosts available".

>From this I'm guessing you're meaning 1024 huge pages aka 2 GB earlier.

Anyway, when you request huge pages to be used for a flavour, the
entire guest RAM must be able to be allocated from huge pages.
ie if you have a guest with 2 GB of RAM, you must have 2 GB worth
of huge pages available. It is not possible for a VM to use
1.5 GB of huge pages and 500 MB of normal sized pages.

> Is this still the same for Mitaka?

Yep, this use of huge pages has not changed.

> Where could I look in the code to see how the scheduling is determined?

Most logic related to huge pages is in nova/virt/hardware.py

> If I use mem_page_size=large (what I originally had), should it evenly
> assign huge pages from the available NUMA nodes (there are two in my case)?
> 
> It looks like it was assigning all VMs to the same NUMA node (0) in this
> case. Is the right way to change to 2048, like I did above?

Nova will always avoid spreading your VM across 2 host NUMA nodes,
since that gives bad performance characteristics. IOW, it will always
allocate huge pages from the NUMA node that the guest will run on. If
you explicitly want your VM to spread across 2 host NUMA nodes, then
you must tell nova to create 2 *guest* NUMA nodes for the VM. Nova
will then place each guest NUMA node, on a separate host NUMA node
and allocate huge pages from node to match. This is done using
the hw:numa_nodes=2 parameter on the flavour

> Again, has this changed at all in Mitaka?

Nope. Well aside from random bug fixes.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] NUMA, huge pages, and scheduling

2016-06-03 Thread Paul Michali
Hi!

I've been playing with Liberty code a bit and had some questions that I'm
hoping Nova folks may be able to provide guidance on...

If I set up a flavor with hw:mem_page_size=2048, and I'm creating (Cirros)
VMs with size 1024, will the scheduling use the minimum of the number of
huge pages available and the size requested for the VM, or will it base
scheduling only on the number of huge pages?

It seems to be doing the latter, where I had 1945 huge pages free, and
tried to create another VM (1024) and Nova rejected the request with "no
hosts available".

Is this still the same for Mitaka?

Where could I look in the code to see how the scheduling is determined?

If I use mem_page_size=large (what I originally had), should it evenly
assign huge pages from the available NUMA nodes (there are two in my case)?

It looks like it was assigning all VMs to the same NUMA node (0) in this
case. Is the right way to change to 2048, like I did above?

Again, has this changed at all in Mitaka?

Lastly, I had a case where there was not enough huge pages, so the create
failed and the VM was in ERROR state. It had created and bound a neutron
port.  I then deleted the VM. The VM disappeared from the list of VMs, but
the Neutron port was still there. I don't see anything in the neutron log
to request deleting the port.  Shouldn't the port have been unbound/deleted?

Any thoughts on how to figure out why not?


Thanks in advance!

PCM
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev