----- Original Message ----- > From: "Paul Michali" <p...@michali.net> > To: "OpenStack Development Mailing List (not for usage questions)" > <openstack-dev@lists.openstack.org> > Sent: Tuesday, June 7, 2016 11:00:30 AM > Subject: Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling > > Anyone have any thoughts on the two questions below? Namely... > > If the huge pages are 2M, we are creating a 2GB VM, have 1945 huge pages, > should the allocation fail (and if so why)?
Were enough pages (1024) available in a single NUMA node? Which release are you using? There was a bug where node 0 would always be picked (and eventually exhausted) but that was - theoretically - fixed under https://bugs.launchpad.net/nova/+bug/1386236 > Why do all the 2GB VMs get created on the same NUMA node, instead of > getting evenly assigned to each of the two NUMA nodes that are available on > the compute node (as a result, allocation fails, when 1/2 the huge pages > are used)? I found that increasing mem_page_size to 2048 resolves the > issue, but don't know why. What was the mem_page_size before it was 2048? I didn't think any smaller value was supported. > ANother thing I was seeing, when the VM create failed due to not enough > huge pages available and was in error state, I could delete the VM, but the > Neutron port was still there. Is that correct? > > I didn't see any log messages in neutron, requesting to unbind and delete > the port. > > Thanks! > > PCM > > . > > On Fri, Jun 3, 2016 at 2:03 PM Paul Michali <p...@michali.net> wrote: > > > Thanks for the link Tim! > > > > Right now, I have two things I'm unsure about... > > > > One is that I had 1945 huge pages left (of size 2048k) and tried to create > > a VM with a small flavor (2GB), which should need 1024 pages, but Nova > > indicated that it wasn't able to find a host (and QEMU reported an > > allocation issue). > > > > The other is that VMs are not being evenly distributed on my two NUMA > > nodes, and instead, are getting created all on one NUMA node. Not sure if > > that is expected (and setting mem_page_size to 2048 is the proper way). > > > > Regards, > > > > PCM > > > > > > On Fri, Jun 3, 2016 at 1:21 PM Tim Bell <tim.b...@cern.ch> wrote: > > > >> The documentation at > >> http://docs.openstack.org/admin-guide/compute-flavors.html is gradually > >> improving. Are there areas which were not covered in your clarifications ? > >> If so, we should fix the documentation too since this is a complex area to > >> configure and good documentation is a great help. > >> > >> > >> > >> BTW, there is also an issue around how the RAM for the BIOS is shadowed. > >> I can’t find the page from a quick google but we found an imbalance when > >> we > >> used 2GB pages as the RAM for BIOS shadowing was done by default in the > >> memory space for only one of the NUMA spaces. > >> > >> > >> > >> Having a look at the KVM XML can also help a bit if you are debugging. > >> > >> > >> > >> Tim > >> > >> > >> > >> *From: *Paul Michali <p...@michali.net> > >> *Reply-To: *"OpenStack Development Mailing List (not for usage > >> questions)" <openstack-dev@lists.openstack.org> > >> *Date: *Friday 3 June 2016 at 15:18 > >> *To: *"Daniel P. Berrange" <berra...@redhat.com>, "OpenStack Development > >> Mailing List (not for usage questions)" < > >> openstack-dev@lists.openstack.org> > >> *Subject: *Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling > >> > >> > >> > >> See PCM inline... > >> > >> On Fri, Jun 3, 2016 at 8:44 AM Daniel P. Berrange <berra...@redhat.com> > >> wrote: > >> > >> On Fri, Jun 03, 2016 at 12:32:17PM +0000, Paul Michali wrote: > >> > Hi! > >> > > >> > I've been playing with Liberty code a bit and had some questions that > >> I'm > >> > hoping Nova folks may be able to provide guidance on... > >> > > >> > If I set up a flavor with hw:mem_page_size=2048, and I'm creating > >> (Cirros) > >> > VMs with size 1024, will the scheduling use the minimum of the number of > >> > >> 1024 what units ? 1024 MB, or 1024 huge pages aka 2048 MB ? > >> > >> > >> > >> PCM: I was using small flavor, which is 2 GB. So that's 2048 MB and the > >> page size is 2048K, so 1024 pages? Hope I have the units right. > >> > >> > >> > >> > >> > >> > >> > huge pages available and the size requested for the VM, or will it base > >> > scheduling only on the number of huge pages? > >> > > >> > It seems to be doing the latter, where I had 1945 huge pages free, and > >> > tried to create another VM (1024) and Nova rejected the request with "no > >> > hosts available". > >> > >> From this I'm guessing you're meaning 1024 huge pages aka 2 GB earlier. > >> > >> Anyway, when you request huge pages to be used for a flavour, the > >> entire guest RAM must be able to be allocated from huge pages. > >> ie if you have a guest with 2 GB of RAM, you must have 2 GB worth > >> of huge pages available. It is not possible for a VM to use > >> 1.5 GB of huge pages and 500 MB of normal sized pages. > >> > >> > >> > >> PCM: Right, so, with 2GB of RAM, I need 1024 huge pages of size 2048K. In > >> this case, there are 1945 huge pages available, so I was wondering why it > >> failed. Maybe I'm confusing sizes/pages? > >> > >> > >> > >> > >> > >> > >> > Is this still the same for Mitaka? > >> > >> Yep, this use of huge pages has not changed. > >> > >> > Where could I look in the code to see how the scheduling is determined? > >> > >> Most logic related to huge pages is in nova/virt/hardware.py > >> > >> > If I use mem_page_size=large (what I originally had), should it evenly > >> > assign huge pages from the available NUMA nodes (there are two in my > >> case)? > >> > > >> > It looks like it was assigning all VMs to the same NUMA node (0) in this > >> > case. Is the right way to change to 2048, like I did above? > >> > >> Nova will always avoid spreading your VM across 2 host NUMA nodes, > >> since that gives bad performance characteristics. IOW, it will always > >> allocate huge pages from the NUMA node that the guest will run on. If > >> you explicitly want your VM to spread across 2 host NUMA nodes, then > >> you must tell nova to create 2 *guest* NUMA nodes for the VM. Nova > >> will then place each guest NUMA node, on a separate host NUMA node > >> and allocate huge pages from node to match. This is done using > >> the hw:numa_nodes=2 parameter on the flavour > >> > >> > >> > >> PCM: Gotcha, but that was not the issue I'm seeing. With this small > >> flavor (2GB = 1024 pages), I had 13107 huge pages initially. As I created > >> VMs, they were *all* placed on the same NUMA node (0). As a result, when I > >> got to more than have the available pages, Nova failed to allow further > >> VMs, even though I had 6963 available on one compute node, and 5939 on > >> another. > >> > >> > >> > >> It seems that all the assignments were to node zero. Someone suggested to > >> me to set mem_page_size to 2048, and at that point it started assigning to > >> both NUMA nodes evenly. > >> > >> > >> > >> Thanks for the help!!! > >> > >> > >> > >> > >> > >> Regards, > >> > >> > >> > >> PCM > >> > >> > >> > >> > >> > Again, has this changed at all in Mitaka? > >> > >> Nope. Well aside from random bug fixes. > >> > >> Regards, > >> Daniel > >> -- > >> |: http://berrange.com -o- > >> http://www.flickr.com/photos/dberrange/ :| > >> |: http://libvirt.org -o- > >> http://virt-manager.org :| > >> |: http://autobuild.org -o- > >> http://search.cpan.org/~danberr/ :| > >> |: http://entangle-photo.org -o- > >> http://live.gnome.org/gtk-vnc :| > >> > >> __________________________________________________________________________ > >> OpenStack Development Mailing List (not for usage questions) > >> Unsubscribe: > >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >> > >> __________________________________________________________________________ > >> OpenStack Development Mailing List (not for usage questions) > >> Unsubscribe: > >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >> > > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -- Steve Gordon, Principal Product Manager, Red Hat OpenStack Platform __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev