Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

Paul Michali Fri, 10 Jun 2016 06:10:20 -0700

See PCM: Inline...


On Thu, Jun 9, 2016 at 11:42 AM Steve Gordon <[email protected]> wrote:

> ----- Original Message -----
> > From: "Paul Michali" <[email protected]>
> > To: "OpenStack Development Mailing List (not for usage questions)" <
> [email protected]>
> > Sent: Tuesday, June 7, 2016 11:00:30 AM
> > Subject: Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling
> >
> > Anyone have any thoughts on the two questions below? Namely...
> >
> > If the huge pages are 2M, we are creating a 2GB VM, have 1945 huge pages,
> > should the allocation fail (and if so why)?
>
> Were enough pages (1024) available in a single NUMA node? Which release
> are you using? There was a bug where node 0 would always be picked (and
> eventually exhausted) but that was - theoretically - fixed under
> https://bugs.launchpad.net/nova/+bug/1386236


PCM: This is on LIberty, so it sounds like the bugfix was in there.  It's
possible that there was not 1024 left, on a single NUMA node.

Regards,

PCM


>
>
> > Why do all the 2GB VMs get created on the same NUMA node, instead of
> > getting evenly assigned to each of the two NUMA nodes that are available
> on
> > the compute node (as a result, allocation fails, when 1/2 the huge pages
> > are used)? I found that increasing mem_page_size to 2048 resolves the
> > issue, but don't know why.
>
> What was the mem_page_size before it was 2048? I didn't think any smaller
> value was supported.
>
> > ANother thing I was seeing, when the VM create failed due to not enough
> > huge pages available and was in error state, I could delete the VM, but
> the
> > Neutron port was still there.  Is that correct?
> >
> > I didn't see any log messages in neutron, requesting to unbind and delete
> > the port.
> >
> > Thanks!
> >
> > PCM
> >
> > .
> >
> > On Fri, Jun 3, 2016 at 2:03 PM Paul Michali <[email protected]> wrote:
> >
> > > Thanks for the link Tim!
> > >
> > > Right now, I have two things I'm unsure about...
> > >
> > > One is that I had 1945 huge pages left (of size 2048k) and tried to
> create
> > > a VM with a small flavor (2GB), which should need 1024 pages, but Nova
> > > indicated that it wasn't able to find a host (and QEMU reported an
> > > allocation issue).
> > >
> > > The other is that VMs are not being evenly distributed on my two NUMA
> > > nodes, and instead, are getting created all on one NUMA node. Not sure
> if
> > > that is expected (and setting mem_page_size to 2048 is the proper way).
> > >
> > > Regards,
> > >
> > > PCM
> > >
> > >
> > > On Fri, Jun 3, 2016 at 1:21 PM Tim Bell <[email protected]> wrote:
> > >
> > >> The documentation at
> > >> http://docs.openstack.org/admin-guide/compute-flavors.html is
> gradually
> > >> improving. Are there areas which were not covered in your
> clarifications ?
> > >> If so, we should fix the documentation too since this is a complex
> area to
> > >> configure and good documentation is a great help.
> > >>
> > >>
> > >>
> > >> BTW, there is also an issue around how the RAM for the BIOS is
> shadowed.
> > >> I can’t find the page from a quick google but we found an imbalance
> when
> > >> we
> > >> used 2GB pages as the RAM for BIOS shadowing was done by default in
> the
> > >> memory space for only one of the NUMA spaces.
> > >>
> > >>
> > >>
> > >> Having a look at the KVM XML can also help a bit if you are debugging.
> > >>
> > >>
> > >>
> > >> Tim
> > >>
> > >>
> > >>
> > >> *From: *Paul Michali <[email protected]>
> > >> *Reply-To: *"OpenStack Development Mailing List (not for usage
> > >> questions)" <[email protected]>
> > >> *Date: *Friday 3 June 2016 at 15:18
> > >> *To: *"Daniel P. Berrange" <[email protected]>, "OpenStack
> Development
> > >> Mailing List (not for usage questions)" <
> > >> [email protected]>
> > >> *Subject: *Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling
> > >>
> > >>
> > >>
> > >> See PCM inline...
> > >>
> > >> On Fri, Jun 3, 2016 at 8:44 AM Daniel P. Berrange <
> [email protected]>
> > >> wrote:
> > >>
> > >> On Fri, Jun 03, 2016 at 12:32:17PM +0000, Paul Michali wrote:
> > >> > Hi!
> > >> >
> > >> > I've been playing with Liberty code a bit and had some questions
> that
> > >> I'm
> > >> > hoping Nova folks may be able to provide guidance on...
> > >> >
> > >> > If I set up a flavor with hw:mem_page_size=2048, and I'm creating
> > >> (Cirros)
> > >> > VMs with size 1024, will the scheduling use the minimum of the
> number of
> > >>
> > >> 1024 what units ? 1024 MB, or 1024 huge pages aka 2048 MB ?
> > >>
> > >>
> > >>
> > >> PCM: I was using small flavor, which is 2 GB. So that's 2048 MB and
> the
> > >> page size is 2048K, so 1024 pages? Hope I have the units right.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> > huge pages available and the size requested for the VM, or will it
> base
> > >> > scheduling only on the number of huge pages?
> > >> >
> > >> > It seems to be doing the latter, where I had 1945 huge pages free,
> and
> > >> > tried to create another VM (1024) and Nova rejected the request
> with "no
> > >> > hosts available".
> > >>
> > >> From this I'm guessing you're meaning 1024 huge pages aka 2 GB
> earlier.
> > >>
> > >> Anyway, when you request huge pages to be used for a flavour, the
> > >> entire guest RAM must be able to be allocated from huge pages.
> > >> ie if you have a guest with 2 GB of RAM, you must have 2 GB worth
> > >> of huge pages available. It is not possible for a VM to use
> > >> 1.5 GB of huge pages and 500 MB of normal sized pages.
> > >>
> > >>
> > >>
> > >> PCM: Right, so, with 2GB of RAM, I need 1024 huge pages of size
> 2048K. In
> > >> this case, there are 1945 huge pages available, so I was wondering
> why it
> > >> failed. Maybe I'm confusing sizes/pages?
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> > Is this still the same for Mitaka?
> > >>
> > >> Yep, this use of huge pages has not changed.
> > >>
> > >> > Where could I look in the code to see how the scheduling is
> determined?
> > >>
> > >> Most logic related to huge pages is in nova/virt/hardware.py
> > >>
> > >> > If I use mem_page_size=large (what I originally had), should it
> evenly
> > >> > assign huge pages from the available NUMA nodes (there are two in my
> > >> case)?
> > >> >
> > >> > It looks like it was assigning all VMs to the same NUMA node (0) in
> this
> > >> > case. Is the right way to change to 2048, like I did above?
> > >>
> > >> Nova will always avoid spreading your VM across 2 host NUMA nodes,
> > >> since that gives bad performance characteristics. IOW, it will always
> > >> allocate huge pages from the NUMA node that the guest will run on. If
> > >> you explicitly want your VM to spread across 2 host NUMA nodes, then
> > >> you must tell nova to create 2 *guest* NUMA nodes for the VM. Nova
> > >> will then place each guest NUMA node, on a separate host NUMA node
> > >> and allocate huge pages from node to match. This is done using
> > >> the hw:numa_nodes=2 parameter on the flavour
> > >>
> > >>
> > >>
> > >> PCM: Gotcha, but that was not the issue I'm seeing. With this small
> > >> flavor (2GB = 1024 pages), I had 13107 huge pages initially. As I
> created
> > >> VMs, they were *all* placed on the same NUMA node (0). As a result,
> when I
> > >> got to more than have the available pages, Nova failed to allow
> further
> > >> VMs, even though I had 6963 available on one compute node, and 5939 on
> > >> another.
> > >>
> > >>
> > >>
> > >> It seems that all the assignments were to node zero. Someone
> suggested to
> > >> me to set mem_page_size to 2048, and at that point it started
> assigning to
> > >> both NUMA nodes evenly.
> > >>
> > >>
> > >>
> > >> Thanks for the help!!!
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> Regards,
> > >>
> > >>
> > >>
> > >> PCM
> > >>
> > >>
> > >>
> > >>
> > >> > Again, has this changed at all in Mitaka?
> > >>
> > >> Nope. Well aside from random bug fixes.
> > >>
> > >> Regards,
> > >> Daniel
> > >> --
> > >> |: http://berrange.com      -o-
> > >> http://www.flickr.com/photos/dberrange/ :|
> > >> |: http://libvirt.org              -o-
> > >> http://virt-manager.org :|
> > >> |: http://autobuild.org       -o-
> > >> http://search.cpan.org/~danberr/ :|
> > >> |: http://entangle-photo.org       -o-
> > >> http://live.gnome.org/gtk-vnc :|
> > >>
> > >>
> __________________________________________________________________________
> > >> OpenStack Development Mailing List (not for usage questions)
> > >> Unsubscribe:
> > >> [email protected]?subject:unsubscribe
> > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > >>
> > >>
> __________________________________________________________________________
> > >> OpenStack Development Mailing List (not for usage questions)
> > >> Unsubscribe:
> > >> [email protected]?subject:unsubscribe
> > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > >>
> > >
> >
> >
> __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> [email protected]?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> --
> Steve Gordon,
> Principal Product Manager,
> Red Hat OpenStack Platform
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: [email protected]?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling

Reply via email to