See PCM: Inline...
On Thu, Jun 9, 2016 at 11:42 AM Steve Gordon <[email protected]> wrote: > ----- Original Message ----- > > From: "Paul Michali" <[email protected]> > > To: "OpenStack Development Mailing List (not for usage questions)" < > [email protected]> > > Sent: Tuesday, June 7, 2016 11:00:30 AM > > Subject: Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling > > > > Anyone have any thoughts on the two questions below? Namely... > > > > If the huge pages are 2M, we are creating a 2GB VM, have 1945 huge pages, > > should the allocation fail (and if so why)? > > Were enough pages (1024) available in a single NUMA node? Which release > are you using? There was a bug where node 0 would always be picked (and > eventually exhausted) but that was - theoretically - fixed under > https://bugs.launchpad.net/nova/+bug/1386236 PCM: This is on LIberty, so it sounds like the bugfix was in there. It's possible that there was not 1024 left, on a single NUMA node. Regards, PCM > > > > Why do all the 2GB VMs get created on the same NUMA node, instead of > > getting evenly assigned to each of the two NUMA nodes that are available > on > > the compute node (as a result, allocation fails, when 1/2 the huge pages > > are used)? I found that increasing mem_page_size to 2048 resolves the > > issue, but don't know why. > > What was the mem_page_size before it was 2048? I didn't think any smaller > value was supported. > > > ANother thing I was seeing, when the VM create failed due to not enough > > huge pages available and was in error state, I could delete the VM, but > the > > Neutron port was still there. Is that correct? > > > > I didn't see any log messages in neutron, requesting to unbind and delete > > the port. > > > > Thanks! > > > > PCM > > > > . > > > > On Fri, Jun 3, 2016 at 2:03 PM Paul Michali <[email protected]> wrote: > > > > > Thanks for the link Tim! > > > > > > Right now, I have two things I'm unsure about... > > > > > > One is that I had 1945 huge pages left (of size 2048k) and tried to > create > > > a VM with a small flavor (2GB), which should need 1024 pages, but Nova > > > indicated that it wasn't able to find a host (and QEMU reported an > > > allocation issue). > > > > > > The other is that VMs are not being evenly distributed on my two NUMA > > > nodes, and instead, are getting created all on one NUMA node. Not sure > if > > > that is expected (and setting mem_page_size to 2048 is the proper way). > > > > > > Regards, > > > > > > PCM > > > > > > > > > On Fri, Jun 3, 2016 at 1:21 PM Tim Bell <[email protected]> wrote: > > > > > >> The documentation at > > >> http://docs.openstack.org/admin-guide/compute-flavors.html is > gradually > > >> improving. Are there areas which were not covered in your > clarifications ? > > >> If so, we should fix the documentation too since this is a complex > area to > > >> configure and good documentation is a great help. > > >> > > >> > > >> > > >> BTW, there is also an issue around how the RAM for the BIOS is > shadowed. > > >> I can’t find the page from a quick google but we found an imbalance > when > > >> we > > >> used 2GB pages as the RAM for BIOS shadowing was done by default in > the > > >> memory space for only one of the NUMA spaces. > > >> > > >> > > >> > > >> Having a look at the KVM XML can also help a bit if you are debugging. > > >> > > >> > > >> > > >> Tim > > >> > > >> > > >> > > >> *From: *Paul Michali <[email protected]> > > >> *Reply-To: *"OpenStack Development Mailing List (not for usage > > >> questions)" <[email protected]> > > >> *Date: *Friday 3 June 2016 at 15:18 > > >> *To: *"Daniel P. Berrange" <[email protected]>, "OpenStack > Development > > >> Mailing List (not for usage questions)" < > > >> [email protected]> > > >> *Subject: *Re: [openstack-dev] [nova] NUMA, huge pages, and scheduling > > >> > > >> > > >> > > >> See PCM inline... > > >> > > >> On Fri, Jun 3, 2016 at 8:44 AM Daniel P. Berrange < > [email protected]> > > >> wrote: > > >> > > >> On Fri, Jun 03, 2016 at 12:32:17PM +0000, Paul Michali wrote: > > >> > Hi! > > >> > > > >> > I've been playing with Liberty code a bit and had some questions > that > > >> I'm > > >> > hoping Nova folks may be able to provide guidance on... > > >> > > > >> > If I set up a flavor with hw:mem_page_size=2048, and I'm creating > > >> (Cirros) > > >> > VMs with size 1024, will the scheduling use the minimum of the > number of > > >> > > >> 1024 what units ? 1024 MB, or 1024 huge pages aka 2048 MB ? > > >> > > >> > > >> > > >> PCM: I was using small flavor, which is 2 GB. So that's 2048 MB and > the > > >> page size is 2048K, so 1024 pages? Hope I have the units right. > > >> > > >> > > >> > > >> > > >> > > >> > > >> > huge pages available and the size requested for the VM, or will it > base > > >> > scheduling only on the number of huge pages? > > >> > > > >> > It seems to be doing the latter, where I had 1945 huge pages free, > and > > >> > tried to create another VM (1024) and Nova rejected the request > with "no > > >> > hosts available". > > >> > > >> From this I'm guessing you're meaning 1024 huge pages aka 2 GB > earlier. > > >> > > >> Anyway, when you request huge pages to be used for a flavour, the > > >> entire guest RAM must be able to be allocated from huge pages. > > >> ie if you have a guest with 2 GB of RAM, you must have 2 GB worth > > >> of huge pages available. It is not possible for a VM to use > > >> 1.5 GB of huge pages and 500 MB of normal sized pages. > > >> > > >> > > >> > > >> PCM: Right, so, with 2GB of RAM, I need 1024 huge pages of size > 2048K. In > > >> this case, there are 1945 huge pages available, so I was wondering > why it > > >> failed. Maybe I'm confusing sizes/pages? > > >> > > >> > > >> > > >> > > >> > > >> > > >> > Is this still the same for Mitaka? > > >> > > >> Yep, this use of huge pages has not changed. > > >> > > >> > Where could I look in the code to see how the scheduling is > determined? > > >> > > >> Most logic related to huge pages is in nova/virt/hardware.py > > >> > > >> > If I use mem_page_size=large (what I originally had), should it > evenly > > >> > assign huge pages from the available NUMA nodes (there are two in my > > >> case)? > > >> > > > >> > It looks like it was assigning all VMs to the same NUMA node (0) in > this > > >> > case. Is the right way to change to 2048, like I did above? > > >> > > >> Nova will always avoid spreading your VM across 2 host NUMA nodes, > > >> since that gives bad performance characteristics. IOW, it will always > > >> allocate huge pages from the NUMA node that the guest will run on. If > > >> you explicitly want your VM to spread across 2 host NUMA nodes, then > > >> you must tell nova to create 2 *guest* NUMA nodes for the VM. Nova > > >> will then place each guest NUMA node, on a separate host NUMA node > > >> and allocate huge pages from node to match. This is done using > > >> the hw:numa_nodes=2 parameter on the flavour > > >> > > >> > > >> > > >> PCM: Gotcha, but that was not the issue I'm seeing. With this small > > >> flavor (2GB = 1024 pages), I had 13107 huge pages initially. As I > created > > >> VMs, they were *all* placed on the same NUMA node (0). As a result, > when I > > >> got to more than have the available pages, Nova failed to allow > further > > >> VMs, even though I had 6963 available on one compute node, and 5939 on > > >> another. > > >> > > >> > > >> > > >> It seems that all the assignments were to node zero. Someone > suggested to > > >> me to set mem_page_size to 2048, and at that point it started > assigning to > > >> both NUMA nodes evenly. > > >> > > >> > > >> > > >> Thanks for the help!!! > > >> > > >> > > >> > > >> > > >> > > >> Regards, > > >> > > >> > > >> > > >> PCM > > >> > > >> > > >> > > >> > > >> > Again, has this changed at all in Mitaka? > > >> > > >> Nope. Well aside from random bug fixes. > > >> > > >> Regards, > > >> Daniel > > >> -- > > >> |: http://berrange.com -o- > > >> http://www.flickr.com/photos/dberrange/ :| > > >> |: http://libvirt.org -o- > > >> http://virt-manager.org :| > > >> |: http://autobuild.org -o- > > >> http://search.cpan.org/~danberr/ :| > > >> |: http://entangle-photo.org -o- > > >> http://live.gnome.org/gtk-vnc :| > > >> > > >> > __________________________________________________________________________ > > >> OpenStack Development Mailing List (not for usage questions) > > >> Unsubscribe: > > >> [email protected]?subject:unsubscribe > > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > >> > > >> > __________________________________________________________________________ > > >> OpenStack Development Mailing List (not for usage questions) > > >> Unsubscribe: > > >> [email protected]?subject:unsubscribe > > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > >> > > > > > > > > __________________________________________________________________________ > > OpenStack Development Mailing List (not for usage questions) > > Unsubscribe: > [email protected]?subject:unsubscribe > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > -- > Steve Gordon, > Principal Product Manager, > Red Hat OpenStack Platform > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
