Re: [Engine-devel] NUMA support action items
On 4/3/2014 7:11 AM, Gilad Chaplik wrote: - Original Message - From: Chegu Vinod chegu_vi...@hp.com To: Xiao-Lei Shi (Bruce, HP Servers-PSC-CQ) xiao-lei@hp.com Cc: Einav Cohen eco...@redhat.com, Shang-Chun Liang (David Liang, HPservers-Core-OE-PSC) shangchun.li...@hp.com, Chuan Liao (Jason Liao, HPservers-Core-OE-PSC) chuan.l...@hp.com, msi...@redhat.com, Da-huai Tang (Gary, MCXS-CQ) da-huai.t...@hp.com, Malini Rao m...@redhat.com, Eldan Hildesheim ehild...@redhat.com, Doron Fediuck dfedi...@redhat.com, sher...@redhat.com, Alexander Wels aw...@redhat.com, Gilad Chaplik gchap...@redhat.com Sent: Thursday, April 3, 2014 3:28:03 PM Subject: RE: NUMA support action items Hi Bruce, The virtual NUMA layout in the guest is a very simple one (not multi-level etc). It is generated by qemu+seabios... and there is no relationship with the host NUMA node distances etc. Let us not worry about gathering Virtual NUMA node distances for now. Vinod CC'ing devel list as well. Having said that, I don't see a reason why not to prepare an infrastructure for that (if it's free) for future versions (guest agent will collect vNuma data in some point in time). If you think having this Virtual NUMA topology (along with the virtual numa node *distance* info.) really helps some future use cases then pl. go ahead... Vinod Thanks, Gilad. -Original Message- From: Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ) Sent: Thursday, April 03, 2014 12:41 AM To: Vinod, Chegu Cc: Einav Cohen; Liang, Shang-Chun (David Liang, HPservers-Core-OE-PSC); Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); msi...@redhat.com; Tang, Da-huai (Gary, MCXS-CQ); Malini Rao; Eldan Hildesheim; Doron Fediuck; sher...@redhat.com; Alexander Wels; Gilad Chaplik Subject: RE: NUMA support action items Hi Vinod, Is it meaningful for us to collect the distance information of vm numa node (maybe in future, not now)? In my understanding, vm numa topology is a simulation of numa topology, since the vcpus are just threads, I don't know how the vm numa node distances are calculated in vm. Is there any relationship between the vNode distances and host node distances? Thanks Best Regards Shi, Xiao-Lei (Bruce) Hewlett-Packard Co., Ltd. HP Servers Core Platform Software China Telephone +86 23 65683093 Mobile +86 18696583447 Email xiao-lei@hp.com -Original Message- From: Vinod, Chegu Sent: Thursday, April 03, 2014 7:18 AM To: Gilad Chaplik Cc: Einav Cohen; Liang, Shang-Chun (David Liang, HPservers-Core-OE-PSC); Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); msi...@redhat.com; Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ); Tang, Da-huai (Gary, MCXS-CQ); Malini Rao; Eldan Hildesheim; Doron Fediuck; sher...@redhat.com; Alexander Wels Subject: RE: NUMA support action items Not sure what the correct way to do this isbut here is a suggestion. Let a given host server diagram shown be very generic...i.e. show the N sockets/nodes numbered from 0 thru N-1. Show the amount of memory and the list of CPUs in each of those sockets/nodes. Draw a generic Interconnect fabric [box] in between which all the sockets connect to Ideally ... Under that host diagram we could show the NUMA node distances in text format (as you know this is derived from the numactl -H and then conveyed from VDSM- oVIrt engine etc). That distance info. will tell the user what the distance between a pair of sockets/nodes are (and they can then do what they wish after that :)). Vinod -Original Message- From: Gilad Chaplik [mailto:gchap...@redhat.com] Sent: Wednesday, April 02, 2014 4:09 PM To: Vinod, Chegu Cc: Einav Cohen; Liang, Shang-Chun (David Liang, HPservers-Core-OE-PSC); Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); msi...@redhat.com; Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ); Tang, Da-huai (Gary, MCXS-CQ); Malini Rao; Eldan Hildesheim; Doron Fediuck; sher...@redhat.com; Alexander Wels Subject: Re: NUMA support action items Thank you Vinod for the much elaborate explanation. GUI-wise, do you want to show those numbers? maybe for first phase, enough to show them via API? A thought, According to your example there could be up to 2 distances, so maybe the 'closer' nodes can be on the same column or sth; I mean to try an illustrate it graphically rather than with numbers (we have enough of those :)). Thanks, Gilad. - Original Message - From: Chegu Vinod chegu_vi...@hp.com To: Einav Cohen eco...@redhat.com Cc: Gilad Chaplik gchap...@redhat.com, Shang-Chun Liang (David Liang, HPservers-Core-OE-PSC) shangchun.li...@hp.com, Chuan Liao (Jason Liao, HPservers-Core-OE-PSC) chuan.l...@hp.com, msi...@redhat.com, Xiao-Lei Shi (Bruce, HP Servers-PSC-CQ) xiao-lei@hp.com, Da-huai Tang (Gary, MCXS-CQ) da-huai.t...@hp.com, Malini Rao m...@redhat.com, Eldan Hildesheim ehild...@redhat.com, Doron Fediuck dfedi...@redhat.com, sher...@redhat.com, Alexander Wels aw...@redhat.com Sent: Saturday, March 29, 2014 8:15:56 AM Subject: Re: NUMA support
Re: [Engine-devel] Please help us to review our database schema design with NUMA feature on ovirt
On 4/3/2014 3:46 AM, Gilad Chaplik wrote: - Original Message - From: Eli Mesika emes...@redhat.com To: Xiao-Lei Shi (Bruce, HP Servers-PSC-CQ) xiao-lei@hp.com Cc: Gilad Chaplik gchap...@redhat.com, Roy Golan rgo...@redhat.com, Omer Frenkel ofren...@redhat.com, Chegu Vinod chegu_vi...@hp.com, Chuan Liao (Jason Liao, HPservers-Core-OE-PSC) chuan.l...@hp.com, Doron Fediuck dfedi...@redhat.com, Shang-Chun Liang (David Liang, HPservers-Core-OE-PSC) shangchun.li...@hp.com, Yaniv Dary yd...@redhat.com, engine-devel@ovirt.org Sent: Thursday, April 3, 2014 10:54:54 AM Subject: Re: Please help us to review our database schema design with NUMA feature on ovirt - Original Message - From: Xiao-Lei Shi (Bruce, HP Servers-PSC-CQ) xiao-lei@hp.com To: Gilad Chaplik gchap...@redhat.com, Eli Mesika emes...@redhat.com Cc: Roy Golan rgo...@redhat.com, Omer Frenkel ofren...@redhat.com, Chegu Vinod chegu_vi...@hp.com, Chuan Liao (Jason Liao, HPservers-Core-OE-PSC) chuan.l...@hp.com, Doron Fediuck dfedi...@redhat.com, Shang-Chun Liang (David Liang, HPservers-Core-OE-PSC) shangchun.li...@hp.com, Yaniv Dary yd...@redhat.com, engine-devel@ovirt.org Sent: Thursday, April 3, 2014 7:25:11 AM Subject: RE: Please help us to review our database schema design with NUMA feature on ovirt Hi all, Please see my comments in line. Thanks Best Regards Shi, Xiao-Lei (Bruce) Hewlett-Packard Co., Ltd. HP Servers Core Platform Software China Telephone +86 23 65683093 Mobile +86 18696583447 Email xiao-lei@hp.com -Original Message- From: Gilad Chaplik [mailto:gchap...@redhat.com] Sent: Thursday, April 03, 2014 9:05 AM To: Eli Mesika Cc: Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ); Roy Golan; Omer Frenkel; Vinod, Chegu; Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); Doron Fediuck; Liang, Shang-Chun (David Liang, HPservers-Core-OE-PSC); Yaniv Dary; engine-devel@ovirt.org Subject: Re: Please help us to review our database schema design with NUMA feature on ovirt Hi all, Sorry for joining-in late. My comments (according to the db diagram section in https://docs.google.com/document/d/1-wdDkm6EDbwyoCIRPPcmbGWAcyQo_ISTY8ykDr0I6VY): 1) Join vm_numa_node and vds_numa_node to a single table (almost identical), one of the FKs can be null. [Bruce] I prefer two tables. Actually host level NUMA node and vm level NUMA node are different objects. In my understanding, vm level NUMA node is just a simulation of host level NUMA node, and host level NUMA node has more features that not in vm NUMA (like several levels of host NUMA topology mentioned by Vinod). We need to consider the extensions of host NUMA in the future. What future extension are you referring to ? Not sure how relevant this is to the discussion but a little bit of background info. here : A VM's Virtual NUMA node topology is generated by qemu+seabios and is based on options specified at the qemu command line (libvirt translates the information in the VM's xml file and invokes the qemu command line with the correct options).. Today there is no support in qemu+seabios for generating multiple levels of Virtual NUMA. A vast majority of the hosts out there (i.e. 2 socket and 4 socket hosts) have only single level of NUMA topology...so this is fine for now. (Multi-level NUMA support in the qemu+seabios is a slightly different topic...and may (or may not) be pursued as a potential future enhancement for qemu so for now let us not worry about such things over-engineer in oVirt infrastructure etc. for multi-level virtual NUMA nodes etc.) The values for the node distances in the virtual NUMA topology are auto-generated defaults (by qemu+seabios) and has no relation with the node distances in the host NUMA topology (which is extracted from the ACPI SLIT tables and are supposed to be representative of the underlying system fabric's inter node latency capabilities etc). All the guest OS needs to know is that there are multiple [virtual] NUMA nodes and these virtual nodes are a single hop away This helps the guest to do the right thing with per node data structure allocations/locking etc and helps it scale/perform better. As I mentioned in another email thread : If it makes sense for some [current/future] use cases to store this virtual NUMA topology info. (along with the node distances) somewhere in the oVirt infrastructure...then please feel free to do so. Let's open the discussion and consider them right now. vNode and Node are the same. Not really sure what I can say here... A VM's virtual NUMA node should be sized (i.e. cpu count in the node) no larger than the host NUMA node. (Ideally they should be of the same size). Vinod Vinod? I agree with Bruce, we have no problem with more tables and constrains should work as expected and remove entries when a Host or VM is removed. I do not like tables that have 2 UUIDs when one of them is null , this is against simple DB normalization We are going
Re: [Engine-devel] Please help us to review our database schema design with NUMA feature on ovirt
On 3/31/2014 2:38 AM, Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ) wrote: We put host level NUMA fields in vds_dynamic because these information are from host itself, and NUMA topology may be changed if the host's hardware make a change. Can you please elaborate ? Are you thinking about resource (cpu and/or memory) hot plug on the host ? Vinod NUMA information are similar to the host's cpu topology information like cpu_cores and cpu_sockets which are in vds_dynamic, we refer to this. VM level NUMA fields are configured by user, and actually we originally think they should be in vm_dynamic. But we found that the field of another feature cpuPin which is similar as NUMA feature is in vm_static, so we put vm NUMA fields in vm_static. Do you think we need to put VM level NUMA fields in vm_dynamic? Thanks Best Regards Shi, Xiao-Lei (Bruce) Hewlett-Packard Co., Ltd. HP Servers Core Platform Software China Telephone +86 23 65683093 Mobile +86 18696583447 Email xiao-lei@hp.com -Original Message- From: Gilad Chaplik [mailto:gchap...@redhat.com] Sent: Monday, March 31, 2014 5:22 PM To: Eli Mesika; Roy Golan Cc: Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ); Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); Doron Fediuck; Vinod, Chegu; Liang, Shang-Chun (David Liang, HPservers-Core-OE-PSC); Yaniv Dary; engine-devel@ovirt.org Subject: Re: Please help us to review our database schema design with NUMA feature on ovirt +1 IMO: vds data should reside in static VM need to think about it. Roy? Thanks, Gilad. - Original Message - From: Eli Mesika emes...@redhat.com To: Xiao-Lei Shi (Bruce, HP Servers-PSC-CQ) xiao-lei@hp.com Cc: Chuan Liao (Jason Liao, HPservers-Core-OE-PSC) chuan.l...@hp.com, Doron Fediuck dfedi...@redhat.com, Gilad Chaplik gchap...@redhat.com, Chegu Vinod chegu_vi...@hp.com, Shang-Chun Liang (David Liang, HPservers-Core-OE-PSC) shangchun.li...@hp.com, Yaniv Dary yd...@redhat.com, engine-devel@ovirt.org Sent: Monday, March 31, 2014 12:12:50 PM Subject: Re: Please help us to review our database schema design with NUMA feature on ovirt - Original Message - From: Xiao-Lei Shi (Bruce, HP Servers-PSC-CQ) xiao-lei@hp.com To: Eli Mesika emes...@redhat.com Cc: Chuan Liao (Jason Liao, HPservers-Core-OE-PSC) chuan.l...@hp.com, Doron Fediuck dfedi...@redhat.com, Gilad Chaplik gchap...@redhat.com, Chegu Vinod chegu_vi...@hp.com, Shang-Chun Liang (David Liang, HPservers-Core-OE-PSC) shangchun.li...@hp.com, Yaniv Dary yd...@redhat.com, engine-devel@ovirt.org Sent: Monday, March 31, 2014 8:56:20 AM Subject: RE: Please help us to review our database schema design with NUMA feature on ovirt Include the devel group. Thanks Eli for the quick responses for our first design and sorry for the nag. We appreciate any of the comments for our database design and will follow the design to do the implementation if no more comments. http://www.ovirt.org/Features/Detailed_NUMA_and_Virtual_NUMA Seems OK for me except an unanswered question I had asked in my first review : Why in the Host level NUMA fields are added to vds_dynamic while in the VM level it is added to vm_static ??? I would expect it to be in both on static or dynamic , can you please explain ? Thanks Thanks Best Regards Shi, Xiao-Lei (Bruce) Hewlett-Packard Co., Ltd. HP Servers Core Platform Software China Telephone +86 23 65683093 Mobile +86 18696583447 Email xiao-lei@hp.com -Original Message- From: Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ) Sent: Friday, March 28, 2014 1:30 PM To: 'Eli Mesika' Cc: Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); Doron Fediuck; Gilad Chaplik; Vinod, Chegu; Liang, Shang-Chun (David Liang, HPservers-Core-OE-PSC); Yaniv Dary Subject: RE: Please help us to review our database schema design with NUMA feature on ovirt Hi Eli, After the UX design meeting, we did some modification for the database schema, and merged some update according to your last review comments. Now the document has been posted on ovirt wikipage, could you help to review the database design again: http://www.ovirt.org/Features/Detailed_NUMA_and_Virtual_NUMA Thanks Best Regards Shi, Xiao-Lei (Bruce) Hewlett-Packard Co., Ltd. HP Servers Core Platform Software China Telephone +86 23 65683093 Mobile +86 18696583447 Email xiao-lei@hp.com -Original Message- From: Eli Mesika [mailto:emes...@redhat.com] Sent: Monday, March 24, 2014 6:24 PM To: Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ) Cc: Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); Doron Fediuck; Gilad Chaplik; Vinod, Chegu; Liang, Shang-Chun (David Liang, HPservers-Core-OE-PSC); Yaniv Dary Subject: Re: Please help us to review our database schema design with NUMA feature on ovirt - Original Message - From: Xiao-Lei Shi (Bruce, HP Servers-PSC-CQ) xiao-lei@hp.com To: Eli Mesika emes...@redhat.com, Chuan Liao (Jason Liao, HPservers-Core-OE-PSC) chuan.l...@hp.com Cc: Doron Fediuck dfedi...@redhat.com, Gilad Chaplik
Re: [Engine-devel] Please help us to review our database schema design with NUMA feature on ovirt
On 3/31/2014 7:13 PM, Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ) wrote: Assemble the related discussions in this mail session. Hi Vinod, On 3/31/2014 2:38 AM, Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ) wrote: We put host level NUMA fields in vds_dynamic because these information are from host itself, and NUMA topology may be changed if the host's hardware make a change. Can you please elaborate ? Are you thinking about resource (cpu and/or memory) hot plug on the host ? [Bruce] It's not about resource hot plug. In ovirt engine, there is a scheduled task which will refresh hosts' and vms' information periodically. Only the dynamic and statistics data will be updated during the refresh. So I think the resource information, such as cpu and/or memory, should be in dynamic and statistics. And in my understanding, the information in dynamic class is the changeable information but with a low varying frequency, like cpu topology, libvirt/kernel versions, etc. Hmm...just to be clear. If one were to exclude resource hot-plug scenarios on the Host...then I would consider the following to be static and not dynamic : - # of NUMA nodes, - # of CPUs in each of the NUMA node - Amount of installed memory in each of the NUMA node - The NUMA node distances. I don't know enough about oVirt features of being able to keep track of (or) orchestrating host level resource hot plug..but If resource hot plug is to be included in the mix then... # of CPUs in a NUMA node and the amount of memory in a given NUMA node could change... (i.e. some CPUs or some sections of memory ranges could be offlined or onlined using hot plug features on the host). I can see the libvirt, qemu versions etc. changing (with less frequency based on user updates etc.)..but for host kernel versions to actually change one would most likely require a reboot of the host at which point I would guess that all of the rebooted host information would have to be synch'd up as part of handshakes between VDSM and oVirt engine. The information in statistics class is the information with a high varying frequency, like the usage of cpu/memory, etc. In my opinion, it's reasonable to put host level NUMA information in vds_dynamic and host level NUMA statistics information in vds_statistics. Got that part... Thanks Vinod Hi Gilad/Roy/Omer, I don't know if my understanding is correct. But according to this guess, I think it's also reasonable to put vm cpuPin information in vm_static. Because cpuPin is user configured information, it will not vary automatically. So we don’t need to refresh this information periodically. Please correct me if there are any mistakes. Hi Eli, Sorry for the nag. If my understanding above is correct, I think we should still put host level NUMA fields in vds_dynamic/vds_statistics and vm level NUMA fields in vm_static. Since vm level NUMA fields are configured by user and they will not vary automatically. Thanks Best Regards Shi, Xiao-Lei (Bruce) Hewlett-Packard Co., Ltd. HP Servers Core Platform Software China Telephone +86 23 65683093 Mobile +86 18696583447 Email xiao-lei@hp.com -Original Message- From: Gilad Chaplik [mailto:gchap...@redhat.com] Sent: Monday, March 31, 2014 9:31 PM To: Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ); Roy Golan; Omer Frenkel Cc: Eli Mesika; Roy Golan; Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); Doron Fediuck; Vinod, Chegu; Liang, Shang-Chun (David Liang, HPservers-Core-OE-PSC); Yaniv Dary; engine-devel@ovirt.org Subject: Re: Please help us to review our database schema design with NUMA feature on ovirt adding Roy Omer. why CPU topology is in dynamic? Thanks, Gilad. - Original Message - From: Xiao-Lei Shi (Bruce, HP Servers-PSC-CQ) xiao-lei@hp.com To: Eli Mesika emes...@redhat.com Cc: Gilad Chaplik gchap...@redhat.com, Roy Golan rgo...@redhat.com, Chuan Liao (Jason Liao, HPservers-Core-OE-PSC) chuan.l...@hp.com, Doron Fediuck dfedi...@redhat.com, Chegu Vinod chegu_vi...@hp.com, Shang-Chun Liang (David Liang, HPservers-Core-OE-PSC) shangchun.li...@hp.com, Yaniv Dary yd...@redhat.com, engine-devel@ovirt.org Sent: Monday, March 31, 2014 3:20:33 PM Subject: RE: Please help us to review our database schema design with NUMA feature on ovirt Thanks Eli. I will move the vm level NUMA fields to vm_dynamic, and the related database schema will be updated accordingly. Thanks Best Regards Shi, Xiao-Lei (Bruce) Hewlett-Packard Co., Ltd. HP Servers Core Platform Software China Telephone +86 23 65683093 Mobile +86 18696583447 Email xiao-lei@hp.com -Original Message- From: Eli Mesika [mailto:emes...@redhat.com] Sent: Monday, March 31, 2014 5:49 PM To: Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ) Cc: Gilad Chaplik; Roy Golan; Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); Doron Fediuck; Vinod, Chegu; Liang, Shang-Chun (David Liang, HPservers-Core-OE-PSC); Yaniv Dary; engine-devel@ovirt.org Subject: Re: Please help us to review our