Re: [Engine-devel] NUMA support action items

2014-04-06 Thread Chegu Vinod

On 4/3/2014 7:11 AM, Gilad Chaplik wrote:

- Original Message -

From: Chegu Vinod chegu_vi...@hp.com
To: Xiao-Lei Shi (Bruce, HP Servers-PSC-CQ) xiao-lei@hp.com
Cc: Einav Cohen eco...@redhat.com, Shang-Chun Liang (David Liang, 
HPservers-Core-OE-PSC)
shangchun.li...@hp.com, Chuan Liao (Jason Liao, HPservers-Core-OE-PSC) 
chuan.l...@hp.com, msi...@redhat.com,
Da-huai Tang (Gary, MCXS-CQ) da-huai.t...@hp.com, Malini Rao m...@redhat.com, 
Eldan Hildesheim
ehild...@redhat.com, Doron Fediuck dfedi...@redhat.com, sher...@redhat.com, 
Alexander Wels
aw...@redhat.com, Gilad Chaplik gchap...@redhat.com
Sent: Thursday, April 3, 2014 3:28:03 PM
Subject: RE: NUMA support action items

Hi Bruce,

The virtual NUMA layout in the guest is a very simple one (not multi-level
etc). It is generated by qemu+seabios... and there is no relationship with
the host NUMA node distances etc.  Let us not worry about gathering Virtual
NUMA node distances for now.

Vinod


CC'ing devel list as well.

Having said that, I don't see a reason why not to prepare an infrastructure for 
that (if it's free) for future versions (guest agent will collect vNuma data in 
some point in time).


If you think having this Virtual NUMA topology (along with the virtual 
numa node *distance* info.) really helps some future use cases then pl. 
go ahead...


Vinod





Thanks,
Gilad.


-Original Message-
From: Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ)
Sent: Thursday, April 03, 2014 12:41 AM
To: Vinod, Chegu
Cc: Einav Cohen; Liang, Shang-Chun (David Liang, HPservers-Core-OE-PSC);
Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); msi...@redhat.com; Tang,
Da-huai (Gary, MCXS-CQ); Malini Rao; Eldan Hildesheim; Doron Fediuck;
sher...@redhat.com; Alexander Wels; Gilad Chaplik
Subject: RE: NUMA support action items

Hi Vinod,

Is it meaningful for us to collect the distance information of vm numa node
(maybe in future, not now)?
In my understanding, vm numa topology is a simulation of numa topology, since
the vcpus are just threads, I don't know how the vm numa node distances are
calculated in vm. Is there any relationship between the vNode distances and
host node distances?

Thanks  Best Regards
Shi, Xiao-Lei (Bruce)

Hewlett-Packard Co., Ltd.
HP Servers Core Platform Software China Telephone +86 23 65683093 Mobile +86
18696583447 Email xiao-lei@hp.com


-Original Message-
From: Vinod, Chegu
Sent: Thursday, April 03, 2014 7:18 AM
To: Gilad Chaplik
Cc: Einav Cohen; Liang, Shang-Chun (David Liang, HPservers-Core-OE-PSC);
Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); msi...@redhat.com; Shi,
Xiao-Lei (Bruce, HP Servers-PSC-CQ); Tang, Da-huai (Gary, MCXS-CQ); Malini
Rao; Eldan Hildesheim; Doron Fediuck; sher...@redhat.com; Alexander Wels
Subject: RE: NUMA support action items

Not sure what the correct way to do this isbut here is a suggestion.

Let a given host server diagram shown be very generic...i.e. show the N
sockets/nodes numbered from 0 thru N-1.  Show the amount of memory and the
list of CPUs in each of those sockets/nodes.
Draw a generic Interconnect fabric [box] in between which all the sockets
connect to

Ideally ... Under that host diagram we could show the NUMA node distances in
text format (as you know this is derived from the numactl -H and then
conveyed from VDSM- oVIrt engine etc).
That distance info. will tell the user what the distance between a pair of
sockets/nodes are (and they can then do what they wish after that :)).

Vinod

-Original Message-
From: Gilad Chaplik [mailto:gchap...@redhat.com]
Sent: Wednesday, April 02, 2014 4:09 PM
To: Vinod, Chegu
Cc: Einav Cohen; Liang, Shang-Chun (David Liang, HPservers-Core-OE-PSC);
Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); msi...@redhat.com; Shi,
Xiao-Lei (Bruce, HP Servers-PSC-CQ); Tang, Da-huai (Gary, MCXS-CQ); Malini
Rao; Eldan Hildesheim; Doron Fediuck; sher...@redhat.com; Alexander Wels
Subject: Re: NUMA support action items

Thank you Vinod for the much elaborate explanation.
GUI-wise, do you want to show those numbers? maybe for first phase, enough to
show them via API?

A thought, According to your example there could be up to 2 distances, so
maybe the 'closer' nodes can be on the same column or sth; I mean to try an
illustrate it graphically rather than with numbers (we have enough of those
:)).

Thanks,
Gilad.

- Original Message -

From: Chegu Vinod chegu_vi...@hp.com
To: Einav Cohen eco...@redhat.com
Cc: Gilad Chaplik gchap...@redhat.com, Shang-Chun Liang (David Liang,
HPservers-Core-OE-PSC)
shangchun.li...@hp.com, Chuan Liao (Jason Liao,
HPservers-Core-OE-PSC) chuan.l...@hp.com, msi...@redhat.com, Xiao-Lei
Shi (Bruce, HP Servers-PSC-CQ) xiao-lei@hp.com, Da-huai Tang
(Gary, MCXS-CQ)
da-huai.t...@hp.com, Malini Rao m...@redhat.com, Eldan Hildesheim
ehild...@redhat.com, Doron Fediuck
dfedi...@redhat.com, sher...@redhat.com, Alexander Wels
aw...@redhat.com
Sent: Saturday, March 29, 2014 8:15:56 AM
Subject: Re: NUMA support

Re: [Engine-devel] Please help us to review our database schema design with NUMA feature on ovirt

2014-04-06 Thread Chegu Vinod

On 4/3/2014 3:46 AM, Gilad Chaplik wrote:

- Original Message -

From: Eli Mesika emes...@redhat.com
To: Xiao-Lei Shi (Bruce, HP Servers-PSC-CQ) xiao-lei@hp.com
Cc: Gilad Chaplik gchap...@redhat.com, Roy Golan rgo...@redhat.com, Omer 
Frenkel ofren...@redhat.com,
Chegu Vinod chegu_vi...@hp.com, Chuan Liao (Jason Liao, HPservers-Core-OE-PSC) 
chuan.l...@hp.com, Doron
Fediuck dfedi...@redhat.com, Shang-Chun Liang (David Liang, 
HPservers-Core-OE-PSC) shangchun.li...@hp.com,
Yaniv Dary yd...@redhat.com, engine-devel@ovirt.org
Sent: Thursday, April 3, 2014 10:54:54 AM
Subject: Re: Please help us to review our database schema design with NUMA 
feature on ovirt



- Original Message -

From: Xiao-Lei Shi (Bruce, HP Servers-PSC-CQ) xiao-lei@hp.com
To: Gilad Chaplik gchap...@redhat.com, Eli Mesika
emes...@redhat.com
Cc: Roy Golan rgo...@redhat.com, Omer Frenkel ofren...@redhat.com,
Chegu Vinod chegu_vi...@hp.com, Chuan
Liao (Jason Liao, HPservers-Core-OE-PSC) chuan.l...@hp.com, Doron
Fediuck dfedi...@redhat.com, Shang-Chun
Liang (David Liang, HPservers-Core-OE-PSC) shangchun.li...@hp.com,
Yaniv Dary yd...@redhat.com,
engine-devel@ovirt.org
Sent: Thursday, April 3, 2014 7:25:11 AM
Subject: RE: Please help us to review our database schema design with NUMA
feature on ovirt

Hi all,
Please see my comments in line.

Thanks  Best Regards
Shi, Xiao-Lei (Bruce)

Hewlett-Packard Co., Ltd.
HP Servers Core Platform Software China
Telephone +86 23 65683093
Mobile +86 18696583447
Email xiao-lei@hp.com

-Original Message-
From: Gilad Chaplik [mailto:gchap...@redhat.com]
Sent: Thursday, April 03, 2014 9:05 AM
To: Eli Mesika
Cc: Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ); Roy Golan; Omer Frenkel;
Vinod,
Chegu; Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); Doron Fediuck;
Liang, Shang-Chun (David Liang, HPservers-Core-OE-PSC); Yaniv Dary;
engine-devel@ovirt.org
Subject: Re: Please help us to review our database schema design with NUMA
feature on ovirt

Hi all,
Sorry for joining-in late.

My comments (according to the db diagram section in
https://docs.google.com/document/d/1-wdDkm6EDbwyoCIRPPcmbGWAcyQo_ISTY8ykDr0I6VY):
1) Join vm_numa_node and vds_numa_node to a single table (almost
identical),
one of the FKs can be null.
[Bruce] I prefer two tables. Actually host level NUMA node and vm level
NUMA
node are different objects. In my understanding, vm level NUMA node is just
a simulation of host level NUMA node, and host level NUMA node has more
features that not in vm NUMA (like several levels of host NUMA topology
mentioned by Vinod). We need to consider the extensions of host NUMA in the
future.

What future extension are you referring to ?


Not sure how relevant this is to the discussion but a little bit of 
background info. here :


A VM's Virtual NUMA node topology is generated by qemu+seabios and is 
based on options specified at the qemu command line (libvirt translates 
the information in the VM's xml file and invokes the qemu command line 
with the correct options)..


Today there is no support in qemu+seabios for generating multiple levels 
of Virtual NUMA. A vast majority of the hosts out there (i.e. 2 socket 
and 4 socket hosts) have only single level of NUMA topology...so this is 
fine for now. (Multi-level NUMA support in the qemu+seabios is a 
slightly different topic...and may (or may not) be pursued as a 
potential future enhancement for qemu so for now let us not worry 
about such things  over-engineer in oVirt infrastructure etc. for 
multi-level virtual NUMA nodes etc.)


The values for the node distances in the virtual NUMA topology are 
auto-generated defaults (by qemu+seabios) and has no relation with the 
node distances in the host NUMA topology (which is extracted from the 
ACPI SLIT tables and are supposed to be representative of the underlying 
system fabric's inter node latency capabilities etc).


All the guest OS needs to know is that there are multiple [virtual] NUMA 
nodes and these virtual nodes are a single hop away This helps the 
guest to do the right thing with per node data structure 
allocations/locking etc and helps it scale/perform better.




As I mentioned in another email thread : If it makes sense for some 
[current/future] use cases to store this virtual NUMA topology info. 
(along with the node distances) somewhere in the oVirt 
infrastructure...then please feel free to do so.






Let's open the discussion and consider them right now. vNode and Node are the 
same.



Not really sure what I can say here...

A VM's virtual NUMA node should be sized (i.e. cpu count in the node) no 
larger than the host NUMA node. (Ideally they should be of the same size).


Vinod


Vinod?


I agree with Bruce, we have no problem with more tables and constrains should
work as expected and remove entries when a Host or VM is removed.
I do not like tables that have 2 UUIDs when one of them is null , this is
against simple DB normalization


We are going

Re: [Engine-devel] Please help us to review our database schema design with NUMA feature on ovirt

2014-04-01 Thread Chegu Vinod

On 3/31/2014 2:38 AM, Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ) wrote:

We put host level NUMA fields in vds_dynamic because these information are from 
host itself, and NUMA topology may be changed if the host's hardware make a 
change.
Can you please elaborate ? Are you thinking about resource (cpu and/or 
memory) hot plug on the host ?


Vinod


  NUMA information are similar to the host's cpu topology information like 
cpu_cores and cpu_sockets which are in vds_dynamic, we refer to this.
VM level NUMA fields are configured by user, and actually we originally think 
they should be in vm_dynamic. But we found that the field of another feature 
cpuPin which is similar as NUMA feature is in vm_static, so we put vm NUMA 
fields in vm_static.
Do you think we need to put VM level NUMA fields in vm_dynamic?

Thanks  Best Regards
Shi, Xiao-Lei (Bruce)

Hewlett-Packard Co., Ltd.
HP Servers Core Platform Software China
Telephone +86 23 65683093
Mobile +86 18696583447
Email xiao-lei@hp.com


-Original Message-
From: Gilad Chaplik [mailto:gchap...@redhat.com]
Sent: Monday, March 31, 2014 5:22 PM
To: Eli Mesika; Roy Golan
Cc: Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ); Liao, Chuan (Jason Liao, 
HPservers-Core-OE-PSC); Doron Fediuck; Vinod, Chegu; Liang, Shang-Chun (David 
Liang, HPservers-Core-OE-PSC); Yaniv Dary; engine-devel@ovirt.org
Subject: Re: Please help us to review our database schema design with NUMA 
feature on ovirt

+1

IMO: vds data should reside in static
VM need to think about it.

Roy?

Thanks,
Gilad.


- Original Message -

From: Eli Mesika emes...@redhat.com
To: Xiao-Lei Shi (Bruce, HP Servers-PSC-CQ) xiao-lei@hp.com
Cc: Chuan Liao (Jason Liao, HPservers-Core-OE-PSC) chuan.l...@hp.com, Doron 
Fediuck dfedi...@redhat.com,
Gilad Chaplik gchap...@redhat.com, Chegu Vinod chegu_vi...@hp.com, 
Shang-Chun Liang (David Liang,
HPservers-Core-OE-PSC) shangchun.li...@hp.com, Yaniv Dary 
yd...@redhat.com, engine-devel@ovirt.org
Sent: Monday, March 31, 2014 12:12:50 PM
Subject: Re: Please help us to review our database schema design with NUMA 
feature on ovirt



- Original Message -

From: Xiao-Lei Shi (Bruce, HP Servers-PSC-CQ) xiao-lei@hp.com
To: Eli Mesika emes...@redhat.com
Cc: Chuan Liao (Jason Liao, HPservers-Core-OE-PSC) chuan.l...@hp.com,
Doron Fediuck dfedi...@redhat.com,
Gilad Chaplik gchap...@redhat.com, Chegu Vinod chegu_vi...@hp.com,
Shang-Chun Liang (David Liang,
HPservers-Core-OE-PSC) shangchun.li...@hp.com, Yaniv Dary
yd...@redhat.com, engine-devel@ovirt.org
Sent: Monday, March 31, 2014 8:56:20 AM
Subject: RE: Please help us to review our database schema design with NUMA
feature on ovirt

Include the devel group.
Thanks Eli for the quick responses for our first design and sorry for the
nag.
We appreciate any of the comments for our database design and will follow
the
design to do the implementation if no more comments.
  http://www.ovirt.org/Features/Detailed_NUMA_and_Virtual_NUMA

Seems OK for me except an unanswered question I had asked in my first review
:

Why in the Host level NUMA fields are added to vds_dynamic while in the VM
level it is added to vm_static ???
I would expect it to be in both on static or dynamic , can you please explain
? Thanks


Thanks  Best Regards
Shi, Xiao-Lei (Bruce)

Hewlett-Packard Co., Ltd.
HP Servers Core Platform Software China
Telephone +86 23 65683093
Mobile +86 18696583447
Email xiao-lei@hp.com

-Original Message-
From: Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ)
Sent: Friday, March 28, 2014 1:30 PM
To: 'Eli Mesika'
Cc: Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); Doron Fediuck; Gilad
Chaplik; Vinod, Chegu; Liang, Shang-Chun (David Liang,
HPservers-Core-OE-PSC); Yaniv Dary
Subject: RE: Please help us to review our database schema design with NUMA
feature on ovirt

Hi Eli,

After the UX design meeting, we did some modification for the database
schema, and merged some update according to your last review comments.
Now the document has been posted on ovirt wikipage, could you help to
review
the database design again:
http://www.ovirt.org/Features/Detailed_NUMA_and_Virtual_NUMA


Thanks  Best Regards
Shi, Xiao-Lei (Bruce)

Hewlett-Packard Co., Ltd.
HP Servers Core Platform Software China Telephone +86 23 65683093 Mobile
+86
18696583447 Email xiao-lei@hp.com


-Original Message-
From: Eli Mesika [mailto:emes...@redhat.com]
Sent: Monday, March 24, 2014 6:24 PM
To: Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ)
Cc: Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); Doron Fediuck; Gilad
Chaplik; Vinod, Chegu; Liang, Shang-Chun (David Liang,
HPservers-Core-OE-PSC); Yaniv Dary
Subject: Re: Please help us to review our database schema design with NUMA
feature on ovirt



- Original Message -

From: Xiao-Lei Shi (Bruce, HP Servers-PSC-CQ) xiao-lei@hp.com
To: Eli Mesika emes...@redhat.com, Chuan Liao (Jason Liao,
HPservers-Core-OE-PSC) chuan.l...@hp.com
Cc: Doron Fediuck dfedi...@redhat.com, Gilad Chaplik

Re: [Engine-devel] Please help us to review our database schema design with NUMA feature on ovirt

2014-04-01 Thread Chegu Vinod

On 3/31/2014 7:13 PM, Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ) wrote:

Assemble the related discussions in this mail session.

Hi Vinod,
On 3/31/2014 2:38 AM, Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ) wrote:

We put host level NUMA fields in vds_dynamic because these information are from 
host itself, and NUMA topology may be changed if the host's hardware make a 
change.

Can you please elaborate ? Are you thinking about resource (cpu and/or
memory) hot plug on the host ?
[Bruce] It's not about resource hot plug. In ovirt engine, there is a scheduled 
task which will refresh hosts' and vms' information periodically. Only the 
dynamic and statistics data will be updated during the refresh. So I think the 
resource information, such as cpu and/or memory, should be in dynamic and 
statistics. And in my understanding, the information in dynamic class is the 
changeable information but with a low varying frequency, like cpu topology, 
libvirt/kernel versions, etc.


Hmm...just to be clear.

If one were to exclude resource hot-plug scenarios on the Host...then I 
would consider the following to be static and not dynamic :


- # of NUMA nodes,
- # of CPUs in each of the NUMA node
- Amount of installed memory in each of the NUMA node
- The NUMA node distances.

I don't know enough about oVirt features of being able to keep track of 
(or) orchestrating host level resource hot plug..but
If resource hot plug is to be included in the mix then... # of CPUs in a 
NUMA node and the amount of memory in a given NUMA node could change... 
(i.e. some CPUs or some sections of memory ranges could be offlined or 
onlined using hot plug features on the host).


I can see the libvirt, qemu versions etc. changing (with less frequency 
based on user updates etc.)..but for host kernel versions to actually 
change one would most likely require a reboot of the host at which point 
I would guess that all of the rebooted host information would have to be 
synch'd up as part of handshakes between VDSM and oVirt engine.



The information in statistics class is the information with a high varying 
frequency, like the usage of cpu/memory, etc. In my opinion, it's reasonable to 
put host level NUMA information in vds_dynamic and host level NUMA statistics 
information in vds_statistics.


Got that part...

Thanks
Vinod


Hi Gilad/Roy/Omer,
I don't know if my understanding is correct. But according to this guess, I 
think it's also reasonable to put vm cpuPin information in vm_static. Because 
cpuPin is user configured information, it will not vary automatically. So we 
don’t need to refresh this information periodically. Please correct me if there 
are any mistakes.

Hi Eli,
Sorry for the nag. If my understanding above is correct, I think we should 
still put host level NUMA fields in vds_dynamic/vds_statistics and vm level 
NUMA fields in vm_static. Since vm level NUMA fields are configured by user and 
they will not vary automatically.


Thanks  Best Regards
Shi, Xiao-Lei (Bruce)

Hewlett-Packard Co., Ltd.
HP Servers Core Platform Software China
Telephone +86 23 65683093
Mobile +86 18696583447
Email xiao-lei@hp.com


-Original Message-
From: Gilad Chaplik [mailto:gchap...@redhat.com]
Sent: Monday, March 31, 2014 9:31 PM
To: Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ); Roy Golan; Omer Frenkel
Cc: Eli Mesika; Roy Golan; Liao, Chuan (Jason Liao, HPservers-Core-OE-PSC); 
Doron Fediuck; Vinod, Chegu; Liang, Shang-Chun (David Liang, 
HPservers-Core-OE-PSC); Yaniv Dary; engine-devel@ovirt.org
Subject: Re: Please help us to review our database schema design with NUMA 
feature on ovirt

adding Roy  Omer.

why CPU topology is in dynamic?

Thanks,
Gilad.

- Original Message -

From: Xiao-Lei Shi (Bruce, HP Servers-PSC-CQ) xiao-lei@hp.com
To: Eli Mesika emes...@redhat.com
Cc: Gilad Chaplik gchap...@redhat.com, Roy Golan
rgo...@redhat.com, Chuan Liao (Jason Liao, HPservers-Core-OE-PSC) chuan.l...@hp.com, Doron 
Fediuck dfedi...@redhat.com, Chegu Vinod
chegu_vi...@hp.com, Shang-Chun Liang (David Liang, HPservers-Core-OE-PSC) 
shangchun.li...@hp.com, Yaniv Dary
yd...@redhat.com, engine-devel@ovirt.org
Sent: Monday, March 31, 2014 3:20:33 PM
Subject: RE: Please help us to review our database schema design with
NUMA feature on ovirt

Thanks Eli.
I will move the vm level NUMA fields to vm_dynamic, and the related
database schema will be updated accordingly.

Thanks  Best Regards
Shi, Xiao-Lei (Bruce)

Hewlett-Packard Co., Ltd.
HP Servers Core Platform Software China Telephone +86 23 65683093
Mobile +86 18696583447 Email xiao-lei@hp.com

-Original Message-
From: Eli Mesika [mailto:emes...@redhat.com]
Sent: Monday, March 31, 2014 5:49 PM
To: Shi, Xiao-Lei (Bruce, HP Servers-PSC-CQ)
Cc: Gilad Chaplik; Roy Golan; Liao, Chuan (Jason Liao,
HPservers-Core-OE-PSC); Doron Fediuck; Vinod, Chegu; Liang, Shang-Chun
(David Liang, HPservers-Core-OE-PSC); Yaniv Dary;
engine-devel@ovirt.org
Subject: Re: Please help us to review our