Re: [libvirt] [RFC] Data in the topology element in the capabilities XML

2013-01-17 Thread Daniel P. Berrange
On Thu, Jan 17, 2013 at 12:12:35AM +0100, Peter Krempa wrote:
 On 01/16/13 21:24, Daniel P. Berrange wrote:
 On Wed, Jan 16, 2013 at 05:06:21PM -0300, Amador Pahim wrote:
 On 01/16/2013 04:30 PM, Daniel P. Berrange wrote:
 On Wed, Jan 16, 2013 at 02:15:37PM -0500, Peter Krempa wrote:
 - Original Message -
 From: Daniel P. Berrange berra...@redhat.com
 To: Peter Krempa pkre...@redhat.com
 Cc: Jiri Denemark jdene...@redhat.com, Amador Pahim 
 apa...@redhat.com, libvirt-l...@redhat.com, dougsl...@redhat.com
 Sent: Wed, 16 Jan 2013 13:39:28 -0500 (EST)
 Subject: Re: [libvirt] [RFC] Data in the topology element in the
 capabilities XML
 
 On Wed, Jan 16, 2013 at 07:31:02PM +0100, Peter Krempa wrote:
 On 01/16/13 19:11, Daniel P. Berrange wrote:
 On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote:
 Hi everybody,
 
 a while ago there was a discussion about changing the data that is
 returned in the topology sub-element:
 
 capabilities
 host
 cpu
 archx86_64/arch
 modelSandyBridge/model
 vendorIntel/vendor
 topology sockets='1' cores='2' threads='2'/
 
 
 The data provided here is as of today taken from the nodeinfo
 detection code and thus is really wrong when the fallback mechanisms
 are used.
 
 To get a useful count, the user has to multiply the data by the
 number of NUMA nodes in the host. With the fallback detection code
 used for nodeinfo the NUMA node count used to get the CPU count
 should be 1 instead of the actual number.
 
 As Jiri proposed, I think we should change this output to separate
 detection code that will not take into account NUMA nodes for this
 output and will rather provide data as the lspci command does.
 
 This change will make the data provided by the element standalone
 and also usable in guest XMLs to mirror host's topology.
 Well there are 2 parts which need to be considered here. What do we 
 report
 in the host capabilities, and how do you configure guest XML.
 
  From a historical compatibility pov I don't think we should be changing
 the host capabilities at all. Simply document that 'sockets' is treated
 as sockets-per-node everywhere, and that it is wrong in the case of
 machines where an socket can internally have multiple NUMA nodes.
 I'm too somewhat concerned about changing this output due to
 historic reasons.
 Apps should be using the separate NUMA topology data in the 
 capabilities
 instead of the CPU topology data, to get accurate CPU counts.
  From the NUMA topology the management apps can't tell if the CPU
 is a core or a thread. For example oVirt/VDSM bases the decisions on
 this information.
 Then, we should add information to the NUMA topology XML to indicate
 which of the child cpu elements are sibling cores or threads.
 
 Perhaps add a 'socket_id' + 'core_id' attribute to every cpu.
 
 In this case, we will also need to add the thread siblings and
 perhaps even core siblings information to allow reliable detection.
 The combination fo core_id/socket_id lets you determine that. If two
 core have the same socket_id then they are cores or threads within the
 same socket. If two cpu have the same socket_id  core_id then they
 are threads within the same core.
 
 Not true to AMD Magny-Cours 6100 series, where different cores can
 share the same physical_id and core_id. And they are not threads.
 This processors has two numa nodes inside the same package (aka
 socket) and they shares the same core ID set. Annoying.
 
 I don't believe there's a problem with that. This example XML
 shows a machine with 4 NUMA nodes, 2 sockets each containing
 2 cores, and 2 threads, giving 16 logical CPUs
 
  topology
cells num='4'
  cell id='0'
cpus num='4'
  cpu id='0' socket_id='0' core_id='0'/
  cpu id='1' socket_id='0' core_id='0'/
  cpu id='2' socket_id='0' core_id='1'/
  cpu id='3' socket_id='0' core_id='1'/
/cpus
  /cell
  cell id='1'
cpus num='2'
  cpu id='4' socket_id='0' core_id='0'/
  cpu id='5' socket_id='0' core_id='0'/
  cpu id='6' socket_id='0' core_id='1'/
  cpu id='7' socket_id='0' core_id='1'/
/cpus
  /cell
  cell id='2'
cpus num='2'
  cpu id='8'  socket_id='1' core_id='0'/
  cpu id='9'  socket_id='1' core_id='0'/
  cpu id='10' socket_id='1' core_id='1'/
  cpu id='11' socket_id='1' core_id='1'/
/cpus
  /cell
  cell id='3'
cpus num='2'
  cpu id='12' socket_id='1' core_id='0'/
  cpu id='13' socket_id='1' core_id='0'/
  cpu id='14' socket_id='1' core_id='1'/
  cpu id='15' socket_id='1' core_id='1'/
/cpus
  /cell
/cells
  /topology
 
 I believe there's enough info there to determine all the co-location
 aspects of all the sockets/core/threads involved.
 
 Well not for all 

Re: [libvirt] [RFC] Data in the topology element in the capabilities XML

2013-01-17 Thread Peter Krempa

On 01/17/13 10:36, Daniel P. Berrange wrote:


Well not for all machines in the wild out there. This is a very
similar approach that libvirt uses now to detect the topology and it
is not enough to detect threads on AMD Bulldozer as the cpus
corresponding to the threads have different core_id's (they are also
considered as cores from the perspective of the kernel). This is
unfortunate for the virtualization management tools as oVirt that
still consider the AMD Bulldozer module as a 1 core with two
threads, even if it registers as two cores.

For AMD Bulldozer to be detected correctly, we would need to expose
the thread_id's along with thread siblings information to determine
the two threads belonging together.


NB, the socket_id / core_id values in the above XML are *not* intended
to be anyway related to similarly named values in /proc/cpuinfo. They
are values libvirt assigns to show the topology accurately.


Hm, in that case I'm not sure if it's worth bothering with detecting the 
topology, then (possibly) making up the data provided in the XML. The 
management applications then will have to calculate the topology from 
our data again just to know the topology. For other possible uses of the 
data the management app would still need to re-detect it on their own if 
they need (for some reason) the actual data.


Also, even with non-real data in these fields here it won't enable us to 
reflect the topology of AMD Bulldozer reliably. We would have to choose 
whether we'd like to report the modules as cores or threads. And each of 
these choices will possibly make someone complain that they don't like 
the choice.


Peter



Daniel



--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] Data in the topology element in the capabilities XML

2013-01-17 Thread Daniel P. Berrange
On Thu, Jan 17, 2013 at 03:41:44PM +0100, Peter Krempa wrote:
 On 01/17/13 10:36, Daniel P. Berrange wrote:
 
 Well not for all machines in the wild out there. This is a very
 similar approach that libvirt uses now to detect the topology and it
 is not enough to detect threads on AMD Bulldozer as the cpus
 corresponding to the threads have different core_id's (they are also
 considered as cores from the perspective of the kernel). This is
 unfortunate for the virtualization management tools as oVirt that
 still consider the AMD Bulldozer module as a 1 core with two
 threads, even if it registers as two cores.
 
 For AMD Bulldozer to be detected correctly, we would need to expose
 the thread_id's along with thread siblings information to determine
 the two threads belonging together.
 
 NB, the socket_id / core_id values in the above XML are *not* intended
 to be anyway related to similarly named values in /proc/cpuinfo. They
 are values libvirt assigns to show the topology accurately.
 
 Hm, in that case I'm not sure if it's worth bothering with detecting
 the topology, then (possibly) making up the data provided in the
 XML. The management applications then will have to calculate the
 topology from our data again just to know the topology. For other
 possible uses of the data the management app would still need to
 re-detect it on their own if they need (for some reason) the actual
 data.

The point is that libvirt is intended to provide a representation
of the data that is independant of the underlying technology. Apps
using libvirt can't expect to read /proc/cpuinfo directly and then
expect libvirt to match that exactly. They should be using the
libvirt APIs/XML for this exclusively  if there is something missing
which prevents them doing this, then we need to add it.

 Also, even with non-real data in these fields here it won't enable
 us to reflect the topology of AMD Bulldozer reliably. We would have
 to choose whether we'd like to report the modules as cores or
 threads. And each of these choices will possibly make someone
 complain that they don't like the choice.

I think you're creating a problem where none exists - there is a clear
difference between what is a hyperthread vs what is a core, so there
is no ambiguity in what we choose to use. We must simply pick the one
that is right.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] Data in the topology element in the capabilities XML

2013-01-17 Thread Peter Krempa

On 01/17/13 15:48, Daniel P. Berrange wrote:

On Thu, Jan 17, 2013 at 03:41:44PM +0100, Peter Krempa wrote:

On 01/17/13 10:36, Daniel P. Berrange wrote:


Well not for all machines in the wild out there. This is a very
similar approach that libvirt uses now to detect the topology and it
is not enough to detect threads on AMD Bulldozer as the cpus
corresponding to the threads have different core_id's (they are also
considered as cores from the perspective of the kernel). This is
unfortunate for the virtualization management tools as oVirt that
still consider the AMD Bulldozer module as a 1 core with two
threads, even if it registers as two cores.

For AMD Bulldozer to be detected correctly, we would need to expose
the thread_id's along with thread siblings information to determine
the two threads belonging together.


NB, the socket_id / core_id values in the above XML are *not* intended
to be anyway related to similarly named values in /proc/cpuinfo. They
are values libvirt assigns to show the topology accurately.


Hm, in that case I'm not sure if it's worth bothering with detecting
the topology, then (possibly) making up the data provided in the
XML. The management applications then will have to calculate the
topology from our data again just to know the topology. For other
possible uses of the data the management app would still need to
re-detect it on their own if they need (for some reason) the actual
data.


The point is that libvirt is intended to provide a representation
of the data that is independant of the underlying technology. Apps
using libvirt can't expect to read /proc/cpuinfo directly and then
expect libvirt to match that exactly. They should be using the
libvirt APIs/XML for this exclusively  if there is something missing
which prevents them doing this, then we need to add it.


Okay, this is fair enough and might work even for non-symetric 
multiprocessor systems.





Also, even with non-real data in these fields here it won't enable
us to reflect the topology of AMD Bulldozer reliably. We would have
to choose whether we'd like to report the modules as cores or
threads. And each of these choices will possibly make someone
complain that they don't like the choice.


I think you're creating a problem where none exists - there is a clear
difference between what is a hyperthread vs what is a core, so there
is no ambiguity in what we choose to use. We must simply pick the one
that is right.



I beg to differ:

This technology is informally called CMT (Clustered Multi-Thread) and 
formally called module by the AMD's marketing service, but whose real 
name is Clustered Integer Core Technology. In terms of hardware 
complexity and functionality, this module is midway between a true 
dual-core processor and its integer power (each thread having a fully 
independent integer core) and a single core processor that has the SMT 
ability, which can create a dual threads processor but with the power of 
one (each thread shares the resources of the core with the other thread).


(source: 
http://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)#BULLDOZER_Core_.28aka_Module.29 
)


oVirt/vdsm considers the Module as a core with two threads, where 
others consider it more as separate cores. The performance depends on 
the type of task that is being run on the module.


The kernel is also using an ambiguous description where the core ID's of 
the two threads of a module are different and also the thread siblings 
field is filled out targeting each other.



Peter


Daniel



--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] Data in the topology element in the capabilities XML

2013-01-17 Thread Peter Krempa

On 01/17/13 16:43, Daniel P. Berrange wrote:

I beg to differ:

This technology is informally called CMT (Clustered Multi-Thread)
and formally called module by the AMD's marketing service, but
whose real name is Clustered Integer Core Technology. In terms of
hardware complexity and functionality, this module is midway
between a true dual-core processor and its integer power (each
thread having a fully independent integer core) and a single core
processor that has the SMT ability, which can create a dual threads
processor but with the power of one (each thread shares the
resources of the core with the other thread).

(source: 
http://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)#BULLDOZER_Core_.28aka_Module.29
)

oVirt/vdsm considers the Module as a core with two threads, where
others consider it more as separate cores. The performance depends
on the type of task that is being run on the module.


So what information does VDSM use to identify this hardware topology.



VDSM uses the topology element (subelement of cpu) in the capabilities 
and multiplies the numbers by the number of numa nodes. As the data 
there is taken from nodeinfo, this breaks on some systems.




In some form or another we need to be providing the more detailed and
accurate topology infomation in the capabilities NUMA description.

What does 'hwloc --no-io --no-caches' shown on a Bulldozer machine ?


# hwloc-info --no-io --no-caches
depth 0:1 Machine (type #1)
 depth 1:   2 Socket (type #3)
  depth 2:  4 NUMANode (type #2)
   depth 3: 32 Core (type #5)
depth 4:32 PU (type #6)

hwloc recognizes it as cores ...

# lscpu
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):32
On-line CPU(s) list:   0-31
Thread(s) per core:2
Core(s) per socket:8
Socket(s): 2

... lscpu as as threads ...

NUMA node(s):  4
Vendor ID: AuthenticAMD
CPU family:21
Model: 1
Stepping:  2
CPU MHz:   1400.000
BogoMIPS:  4389.80
Virtualization:AMD-V
L1d cache: 16K
L1i cache: 64K
L2 cache:  2048K
L3 cache:  6144K
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
NUMA node2 CPU(s): 16-23
NUMA node3 CPU(s): 24-31




Daniel



--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] Data in the topology element in the capabilities XML

2013-01-17 Thread Daniel P. Berrange
On Thu, Jan 17, 2013 at 05:14:30PM +0100, Peter Krempa wrote:
 On 01/17/13 16:43, Daniel P. Berrange wrote:
 VDSM uses the topology element (subelement of cpu) in the
 capabilities and multiplies the numbers by the number of numa nodes.
 As the data there is taken from nodeinfo, this breaks on some
 systems.

If it is merely looking at the current CPU topology then I see
no reason why we can't make enough information available in the
NUMA topology to let it do the right thing on all systems.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] Data in the topology element in the capabilities XML

2013-01-17 Thread Peter Krempa

On 01/17/13 17:56, Daniel P. Berrange wrote:

On Thu, Jan 17, 2013 at 05:14:30PM +0100, Peter Krempa wrote:

On 01/17/13 16:43, Daniel P. Berrange wrote:
VDSM uses the topology element (subelement of cpu) in the
capabilities and multiplies the numbers by the number of numa nodes.
As the data there is taken from nodeinfo, this breaks on some
systems.


If it is merely looking at the current CPU topology then I see
no reason why we can't make enough information available in the
NUMA topology to let it do the right thing on all systems.


We definitely can. I'm not against it, I just think it's a pretty heavy 
hammer for the problem. But as long as it provides reasonable data I 
don't care.


The thing I care about is providing reasonable data so that the AMD 
Bulldozer CPU can be detected in either way depending on the choice of 
the management app.




Regards,
Daniel



--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [RFC] Data in the topology element in the capabilities XML

2013-01-16 Thread Peter Krempa

Hi everybody,

a while ago there was a discussion about changing the data that is 
returned in the topology sub-element:


capabilities
  host
cpu
  archx86_64/arch
  modelSandyBridge/model
  vendorIntel/vendor
  topology sockets='1' cores='2' threads='2'/


The data provided here is as of today taken from the nodeinfo detection 
code and thus is really wrong when the fallback mechanisms are used.


To get a useful count, the user has to multiply the data by the number 
of NUMA nodes in the host. With the fallback detection code used for 
nodeinfo the NUMA node count used to get the CPU count should be 1 
instead of the actual number.


As Jiri proposed, I think we should change this output to separate 
detection code that will not take into account NUMA nodes for this 
output and will rather provide data as the lspci command does.


This change will make the data provided by the element standalone and 
also usable in guest XMLs to mirror host's topology.


The meaning of the attributes of that element isn't really documented 
anywhere, so as an additional precaution we should document that.


( http://libvirt.org/formatcaps.html )


For some additional background, please refer to the original discussion 
here:

https://www.redhat.com/archives/libvir-list/2012-March/msg01123.html

Thanks in advance for your comments and suggestions.

Peter

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] Data in the topology element in the capabilities XML

2013-01-16 Thread Daniel P. Berrange
On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote:
 Hi everybody,
 
 a while ago there was a discussion about changing the data that is
 returned in the topology sub-element:
 
 capabilities
   host
 cpu
   archx86_64/arch
   modelSandyBridge/model
   vendorIntel/vendor
   topology sockets='1' cores='2' threads='2'/
 
 
 The data provided here is as of today taken from the nodeinfo
 detection code and thus is really wrong when the fallback mechanisms
 are used.
 
 To get a useful count, the user has to multiply the data by the
 number of NUMA nodes in the host. With the fallback detection code
 used for nodeinfo the NUMA node count used to get the CPU count
 should be 1 instead of the actual number.
 
 As Jiri proposed, I think we should change this output to separate
 detection code that will not take into account NUMA nodes for this
 output and will rather provide data as the lspci command does.
 
 This change will make the data provided by the element standalone
 and also usable in guest XMLs to mirror host's topology.

Well there are 2 parts which need to be considered here. What do we report
in the host capabilities, and how do you configure guest XML.

From a historical compatibility pov I don't think we should be changing
the host capabilities at all. Simply document that 'sockets' is treated
as sockets-per-node everywhere, and that it is wrong in the case of
machines where an socket can internally have multiple NUMA nodes.

Apps should be using the separate NUMA topology data in the capabilities
instead of the CPU topology data, to get accurate CPU counts.

For the guest there are two cases to consider. If there is no NUMA in the
guest there is no problem, because total sockets and sockets per node
are the same. In the case where there is NUMA set, we should just ignore
the guest 'sockets' attribute completely, and treat the 'cores'  'threads'
attributes and vcpu and numa elements as providing canonical data.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] Data in the topology element in the capabilities XML

2013-01-16 Thread Peter Krempa

On 01/16/13 19:11, Daniel P. Berrange wrote:

On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote:

Hi everybody,

a while ago there was a discussion about changing the data that is
returned in the topology sub-element:

capabilities
   host
 cpu
   archx86_64/arch
   modelSandyBridge/model
   vendorIntel/vendor
   topology sockets='1' cores='2' threads='2'/


The data provided here is as of today taken from the nodeinfo
detection code and thus is really wrong when the fallback mechanisms
are used.

To get a useful count, the user has to multiply the data by the
number of NUMA nodes in the host. With the fallback detection code
used for nodeinfo the NUMA node count used to get the CPU count
should be 1 instead of the actual number.

As Jiri proposed, I think we should change this output to separate
detection code that will not take into account NUMA nodes for this
output and will rather provide data as the lspci command does.

This change will make the data provided by the element standalone
and also usable in guest XMLs to mirror host's topology.


Well there are 2 parts which need to be considered here. What do we report
in the host capabilities, and how do you configure guest XML.

 From a historical compatibility pov I don't think we should be changing
the host capabilities at all. Simply document that 'sockets' is treated
as sockets-per-node everywhere, and that it is wrong in the case of
machines where an socket can internally have multiple NUMA nodes.


I'm too somewhat concerned about changing this output due to historic 
reasons.


Apps should be using the separate NUMA topology data in the capabilities
instead of the CPU topology data, to get accurate CPU counts.


From the NUMA topology the management apps can't tell if the CPU is a 
core or a thread. For example oVirt/VDSM bases the decisions on this 
information.


The management apps tend to avoid using cores as CPUs for the guests for 
performance reasons.


Any other ideas how to provide this kind of information to the mgmt apps?



For the guest there are two cases to consider. If there is no NUMA in the
guest there is no problem, because total sockets and sockets per node
are the same. In the case where there is NUMA set, we should just ignore
the guest 'sockets' attribute completely, and treat the 'cores'  'threads'
attributes and vcpu and numa elements as providing canonical data.

Daniel



--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] Data in the topology element in the capabilities XML

2013-01-16 Thread Daniel P. Berrange
On Wed, Jan 16, 2013 at 07:31:02PM +0100, Peter Krempa wrote:
 On 01/16/13 19:11, Daniel P. Berrange wrote:
 On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote:
 Hi everybody,
 
 a while ago there was a discussion about changing the data that is
 returned in the topology sub-element:
 
 capabilities
host
  cpu
archx86_64/arch
modelSandyBridge/model
vendorIntel/vendor
topology sockets='1' cores='2' threads='2'/
 
 
 The data provided here is as of today taken from the nodeinfo
 detection code and thus is really wrong when the fallback mechanisms
 are used.
 
 To get a useful count, the user has to multiply the data by the
 number of NUMA nodes in the host. With the fallback detection code
 used for nodeinfo the NUMA node count used to get the CPU count
 should be 1 instead of the actual number.
 
 As Jiri proposed, I think we should change this output to separate
 detection code that will not take into account NUMA nodes for this
 output and will rather provide data as the lspci command does.
 
 This change will make the data provided by the element standalone
 and also usable in guest XMLs to mirror host's topology.
 
 Well there are 2 parts which need to be considered here. What do we report
 in the host capabilities, and how do you configure guest XML.
 
  From a historical compatibility pov I don't think we should be changing
 the host capabilities at all. Simply document that 'sockets' is treated
 as sockets-per-node everywhere, and that it is wrong in the case of
 machines where an socket can internally have multiple NUMA nodes.
 
 I'm too somewhat concerned about changing this output due to
 historic reasons.
 
 Apps should be using the separate NUMA topology data in the capabilities
 instead of the CPU topology data, to get accurate CPU counts.
 
 From the NUMA topology the management apps can't tell if the CPU
 is a core or a thread. For example oVirt/VDSM bases the decisions on
 this information.

Then, we should add information to the NUMA topology XML to indicate
which of the child cpu elements are sibling cores or threads.

Perhaps add a 'socket_id' + 'core_id' attribute to every cpu.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] Data in the topology element in the capabilities XML

2013-01-16 Thread Peter Krempa

- Original Message -
From: Daniel P. Berrange berra...@redhat.com
To: Peter Krempa pkre...@redhat.com
Cc: Jiri Denemark jdene...@redhat.com, Amador Pahim apa...@redhat.com, 
libvirt-l...@redhat.com, dougsl...@redhat.com
Sent: Wed, 16 Jan 2013 13:39:28 -0500 (EST)
Subject: Re: [libvirt] [RFC] Data in the topology element in the  
capabilities XML

On Wed, Jan 16, 2013 at 07:31:02PM +0100, Peter Krempa wrote:
 On 01/16/13 19:11, Daniel P. Berrange wrote:
 On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote:
 Hi everybody,
 
 a while ago there was a discussion about changing the data that is
 returned in the topology sub-element:
 
 capabilities
  host
  cpu
  archx86_64/arch
  modelSandyBridge/model
  vendorIntel/vendor
  topology sockets='1' cores='2' threads='2'/
 
 
 The data provided here is as of today taken from the nodeinfo
 detection code and thus is really wrong when the fallback mechanisms
 are used.
 
 To get a useful count, the user has to multiply the data by the
 number of NUMA nodes in the host. With the fallback detection code
 used for nodeinfo the NUMA node count used to get the CPU count
 should be 1 instead of the actual number.
 
 As Jiri proposed, I think we should change this output to separate
 detection code that will not take into account NUMA nodes for this
 output and will rather provide data as the lspci command does.
 
 This change will make the data provided by the element standalone
 and also usable in guest XMLs to mirror host's topology.
 
 Well there are 2 parts which need to be considered here. What do we report
 in the host capabilities, and how do you configure guest XML.
 
  From a historical compatibility pov I don't think we should be changing
 the host capabilities at all. Simply document that 'sockets' is treated
 as sockets-per-node everywhere, and that it is wrong in the case of
 machines where an socket can internally have multiple NUMA nodes.
 
 I'm too somewhat concerned about changing this output due to
 historic reasons.
 
 Apps should be using the separate NUMA topology data in the capabilities
 instead of the CPU topology data, to get accurate CPU counts.
 
 From the NUMA topology the management apps can't tell if the CPU
 is a core or a thread. For example oVirt/VDSM bases the decisions on
 this information.

Then, we should add information to the NUMA topology XML to indicate
which of the child cpu elements are sibling cores or threads.

Perhaps add a 'socket_id' + 'core_id' attribute to every cpu.

In this case, we will also need to add the thread siblings and perhaps even 
core siblings information to allow reliable detection.

Peter

Regards,
Daniel
-- 
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] Data in the topology element in the capabilities XML

2013-01-16 Thread Daniel P. Berrange
On Wed, Jan 16, 2013 at 02:15:37PM -0500, Peter Krempa wrote:
 
 - Original Message -
 From: Daniel P. Berrange berra...@redhat.com
 To: Peter Krempa pkre...@redhat.com
 Cc: Jiri Denemark jdene...@redhat.com, Amador Pahim apa...@redhat.com, 
 libvirt-l...@redhat.com, dougsl...@redhat.com
 Sent: Wed, 16 Jan 2013 13:39:28 -0500 (EST)
 Subject: Re: [libvirt] [RFC] Data in the topology element in the
 capabilities XML
 
 On Wed, Jan 16, 2013 at 07:31:02PM +0100, Peter Krempa wrote:
  On 01/16/13 19:11, Daniel P. Berrange wrote:
  On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote:
  Hi everybody,
  
  a while ago there was a discussion about changing the data that is
  returned in the topology sub-element:
  
  capabilities
   host
   cpu
   archx86_64/arch
   modelSandyBridge/model
   vendorIntel/vendor
   topology sockets='1' cores='2' threads='2'/
  
  
  The data provided here is as of today taken from the nodeinfo
  detection code and thus is really wrong when the fallback mechanisms
  are used.
  
  To get a useful count, the user has to multiply the data by the
  number of NUMA nodes in the host. With the fallback detection code
  used for nodeinfo the NUMA node count used to get the CPU count
  should be 1 instead of the actual number.
  
  As Jiri proposed, I think we should change this output to separate
  detection code that will not take into account NUMA nodes for this
  output and will rather provide data as the lspci command does.
  
  This change will make the data provided by the element standalone
  and also usable in guest XMLs to mirror host's topology.
  
  Well there are 2 parts which need to be considered here. What do we report
  in the host capabilities, and how do you configure guest XML.
  
   From a historical compatibility pov I don't think we should be changing
  the host capabilities at all. Simply document that 'sockets' is treated
  as sockets-per-node everywhere, and that it is wrong in the case of
  machines where an socket can internally have multiple NUMA nodes.
  
  I'm too somewhat concerned about changing this output due to
  historic reasons.
  
  Apps should be using the separate NUMA topology data in the capabilities
  instead of the CPU topology data, to get accurate CPU counts.
  
  From the NUMA topology the management apps can't tell if the CPU
  is a core or a thread. For example oVirt/VDSM bases the decisions on
  this information.
 
 Then, we should add information to the NUMA topology XML to indicate
 which of the child cpu elements are sibling cores or threads.
 
 Perhaps add a 'socket_id' + 'core_id' attribute to every cpu.


 In this case, we will also need to add the thread siblings and
 perhaps even core siblings information to allow reliable detection.

The combination fo core_id/socket_id lets you determine that. If two
core have the same socket_id then they are cores or threads within the
same socket. If two cpu have the same socket_id  core_id then they
are threads within the same core.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] Data in the topology element in the capabilities XML

2013-01-16 Thread Amador Pahim

On 01/16/2013 04:30 PM, Daniel P. Berrange wrote:

On Wed, Jan 16, 2013 at 02:15:37PM -0500, Peter Krempa wrote:

- Original Message -
From: Daniel P. Berrange berra...@redhat.com
To: Peter Krempa pkre...@redhat.com
Cc: Jiri Denemark jdene...@redhat.com, Amador Pahim apa...@redhat.com, 
libvirt-l...@redhat.com, dougsl...@redhat.com
Sent: Wed, 16 Jan 2013 13:39:28 -0500 (EST)
Subject: Re: [libvirt] [RFC] Data in the topology element in the
capabilities XML

On Wed, Jan 16, 2013 at 07:31:02PM +0100, Peter Krempa wrote:

On 01/16/13 19:11, Daniel P. Berrange wrote:

On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote:

Hi everybody,

a while ago there was a discussion about changing the data that is
returned in the topology sub-element:

capabilities
host
cpu
archx86_64/arch
modelSandyBridge/model
vendorIntel/vendor
topology sockets='1' cores='2' threads='2'/


The data provided here is as of today taken from the nodeinfo
detection code and thus is really wrong when the fallback mechanisms
are used.

To get a useful count, the user has to multiply the data by the
number of NUMA nodes in the host. With the fallback detection code
used for nodeinfo the NUMA node count used to get the CPU count
should be 1 instead of the actual number.

As Jiri proposed, I think we should change this output to separate
detection code that will not take into account NUMA nodes for this
output and will rather provide data as the lspci command does.

This change will make the data provided by the element standalone
and also usable in guest XMLs to mirror host's topology.

Well there are 2 parts which need to be considered here. What do we report
in the host capabilities, and how do you configure guest XML.

 From a historical compatibility pov I don't think we should be changing
the host capabilities at all. Simply document that 'sockets' is treated
as sockets-per-node everywhere, and that it is wrong in the case of
machines where an socket can internally have multiple NUMA nodes.

I'm too somewhat concerned about changing this output due to
historic reasons.

Apps should be using the separate NUMA topology data in the capabilities
instead of the CPU topology data, to get accurate CPU counts.

 From the NUMA topology the management apps can't tell if the CPU
is a core or a thread. For example oVirt/VDSM bases the decisions on
this information.

Then, we should add information to the NUMA topology XML to indicate
which of the child cpu elements are sibling cores or threads.

Perhaps add a 'socket_id' + 'core_id' attribute to every cpu.



In this case, we will also need to add the thread siblings and
perhaps even core siblings information to allow reliable detection.

The combination fo core_id/socket_id lets you determine that. If two
core have the same socket_id then they are cores or threads within the
same socket. If two cpu have the same socket_id  core_id then they
are threads within the same core.


Not true to AMD Magny-Cours 6100 series, where different cores can share 
the same physical_id and core_id. And they are not threads. This 
processors has two numa nodes inside the same package (aka socket) and 
they shares the same core ID set. Annoying.
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC] Data in the topology element in the capabilities XML

2013-01-16 Thread Daniel P. Berrange
On Wed, Jan 16, 2013 at 05:06:21PM -0300, Amador Pahim wrote:
 On 01/16/2013 04:30 PM, Daniel P. Berrange wrote:
 On Wed, Jan 16, 2013 at 02:15:37PM -0500, Peter Krempa wrote:
 - Original Message -
 From: Daniel P. Berrange berra...@redhat.com
 To: Peter Krempa pkre...@redhat.com
 Cc: Jiri Denemark jdene...@redhat.com, Amador Pahim apa...@redhat.com, 
 libvirt-l...@redhat.com, dougsl...@redhat.com
 Sent: Wed, 16 Jan 2013 13:39:28 -0500 (EST)
 Subject: Re: [libvirt] [RFC] Data in the topology element in the  
 capabilities XML
 
 On Wed, Jan 16, 2013 at 07:31:02PM +0100, Peter Krempa wrote:
 On 01/16/13 19:11, Daniel P. Berrange wrote:
 On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote:
 Hi everybody,
 
 a while ago there was a discussion about changing the data that is
 returned in the topology sub-element:
 
 capabilities
 host
 cpu
 archx86_64/arch
 modelSandyBridge/model
 vendorIntel/vendor
 topology sockets='1' cores='2' threads='2'/
 
 
 The data provided here is as of today taken from the nodeinfo
 detection code and thus is really wrong when the fallback mechanisms
 are used.
 
 To get a useful count, the user has to multiply the data by the
 number of NUMA nodes in the host. With the fallback detection code
 used for nodeinfo the NUMA node count used to get the CPU count
 should be 1 instead of the actual number.
 
 As Jiri proposed, I think we should change this output to separate
 detection code that will not take into account NUMA nodes for this
 output and will rather provide data as the lspci command does.
 
 This change will make the data provided by the element standalone
 and also usable in guest XMLs to mirror host's topology.
 Well there are 2 parts which need to be considered here. What do we report
 in the host capabilities, and how do you configure guest XML.
 
  From a historical compatibility pov I don't think we should be changing
 the host capabilities at all. Simply document that 'sockets' is treated
 as sockets-per-node everywhere, and that it is wrong in the case of
 machines where an socket can internally have multiple NUMA nodes.
 I'm too somewhat concerned about changing this output due to
 historic reasons.
 Apps should be using the separate NUMA topology data in the capabilities
 instead of the CPU topology data, to get accurate CPU counts.
  From the NUMA topology the management apps can't tell if the CPU
 is a core or a thread. For example oVirt/VDSM bases the decisions on
 this information.
 Then, we should add information to the NUMA topology XML to indicate
 which of the child cpu elements are sibling cores or threads.
 
 Perhaps add a 'socket_id' + 'core_id' attribute to every cpu.
 
 In this case, we will also need to add the thread siblings and
 perhaps even core siblings information to allow reliable detection.
 The combination fo core_id/socket_id lets you determine that. If two
 core have the same socket_id then they are cores or threads within the
 same socket. If two cpu have the same socket_id  core_id then they
 are threads within the same core.
 
 Not true to AMD Magny-Cours 6100 series, where different cores can
 share the same physical_id and core_id. And they are not threads.
 This processors has two numa nodes inside the same package (aka
 socket) and they shares the same core ID set. Annoying.

I don't believe there's a problem with that. This example XML
shows a machine with 4 NUMA nodes, 2 sockets each containing
2 cores, and 2 threads, giving 16 logical CPUs

topology
  cells num='4'
cell id='0'
  cpus num='4'
cpu id='0' socket_id='0' core_id='0'/
cpu id='1' socket_id='0' core_id='0'/
cpu id='2' socket_id='0' core_id='1'/
cpu id='3' socket_id='0' core_id='1'/
  /cpus
/cell
cell id='1'
  cpus num='2'
cpu id='4' socket_id='0' core_id='0'/
cpu id='5' socket_id='0' core_id='0'/
cpu id='6' socket_id='0' core_id='1'/
cpu id='7' socket_id='0' core_id='1'/
  /cpus
/cell
cell id='2'
  cpus num='2'
cpu id='8'  socket_id='1' core_id='0'/
cpu id='9'  socket_id='1' core_id='0'/
cpu id='10' socket_id='1' core_id='1'/
cpu id='11' socket_id='1' core_id='1'/
  /cpus
/cell
cell id='3'
  cpus num='2'
cpu id='12' socket_id='1' core_id='0'/
cpu id='13' socket_id='1' core_id='0'/
cpu id='14' socket_id='1' core_id='1'/
cpu id='15' socket_id='1' core_id='1'/
  /cpus
/cell
  /cells
/topology

I believe there's enough info there to determine all the co-location
aspects of all the sockets/core/threads involved.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- 

Re: [libvirt] [RFC] Data in the topology element in the capabilities XML

2013-01-16 Thread Peter Krempa

On 01/16/13 21:24, Daniel P. Berrange wrote:

On Wed, Jan 16, 2013 at 05:06:21PM -0300, Amador Pahim wrote:

On 01/16/2013 04:30 PM, Daniel P. Berrange wrote:

On Wed, Jan 16, 2013 at 02:15:37PM -0500, Peter Krempa wrote:

- Original Message -
From: Daniel P. Berrange berra...@redhat.com
To: Peter Krempa pkre...@redhat.com
Cc: Jiri Denemark jdene...@redhat.com, Amador Pahim apa...@redhat.com, 
libvirt-l...@redhat.com, dougsl...@redhat.com
Sent: Wed, 16 Jan 2013 13:39:28 -0500 (EST)
Subject: Re: [libvirt] [RFC] Data in the topology element in the
capabilities XML

On Wed, Jan 16, 2013 at 07:31:02PM +0100, Peter Krempa wrote:

On 01/16/13 19:11, Daniel P. Berrange wrote:

On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote:

Hi everybody,

a while ago there was a discussion about changing the data that is
returned in the topology sub-element:

capabilities
host
cpu
archx86_64/arch
modelSandyBridge/model
vendorIntel/vendor
topology sockets='1' cores='2' threads='2'/


The data provided here is as of today taken from the nodeinfo
detection code and thus is really wrong when the fallback mechanisms
are used.

To get a useful count, the user has to multiply the data by the
number of NUMA nodes in the host. With the fallback detection code
used for nodeinfo the NUMA node count used to get the CPU count
should be 1 instead of the actual number.

As Jiri proposed, I think we should change this output to separate
detection code that will not take into account NUMA nodes for this
output and will rather provide data as the lspci command does.

This change will make the data provided by the element standalone
and also usable in guest XMLs to mirror host's topology.

Well there are 2 parts which need to be considered here. What do we report
in the host capabilities, and how do you configure guest XML.

 From a historical compatibility pov I don't think we should be changing
the host capabilities at all. Simply document that 'sockets' is treated
as sockets-per-node everywhere, and that it is wrong in the case of
machines where an socket can internally have multiple NUMA nodes.

I'm too somewhat concerned about changing this output due to
historic reasons.

Apps should be using the separate NUMA topology data in the capabilities
instead of the CPU topology data, to get accurate CPU counts.

 From the NUMA topology the management apps can't tell if the CPU
is a core or a thread. For example oVirt/VDSM bases the decisions on
this information.

Then, we should add information to the NUMA topology XML to indicate
which of the child cpu elements are sibling cores or threads.

Perhaps add a 'socket_id' + 'core_id' attribute to every cpu.



In this case, we will also need to add the thread siblings and
perhaps even core siblings information to allow reliable detection.

The combination fo core_id/socket_id lets you determine that. If two
core have the same socket_id then they are cores or threads within the
same socket. If two cpu have the same socket_id  core_id then they
are threads within the same core.


Not true to AMD Magny-Cours 6100 series, where different cores can
share the same physical_id and core_id. And they are not threads.
This processors has two numa nodes inside the same package (aka
socket) and they shares the same core ID set. Annoying.


I don't believe there's a problem with that. This example XML
shows a machine with 4 NUMA nodes, 2 sockets each containing
2 cores, and 2 threads, giving 16 logical CPUs

 topology
   cells num='4'
 cell id='0'
   cpus num='4'
 cpu id='0' socket_id='0' core_id='0'/
 cpu id='1' socket_id='0' core_id='0'/
 cpu id='2' socket_id='0' core_id='1'/
 cpu id='3' socket_id='0' core_id='1'/
   /cpus
 /cell
 cell id='1'
   cpus num='2'
 cpu id='4' socket_id='0' core_id='0'/
 cpu id='5' socket_id='0' core_id='0'/
 cpu id='6' socket_id='0' core_id='1'/
 cpu id='7' socket_id='0' core_id='1'/
   /cpus
 /cell
 cell id='2'
   cpus num='2'
 cpu id='8'  socket_id='1' core_id='0'/
 cpu id='9'  socket_id='1' core_id='0'/
 cpu id='10' socket_id='1' core_id='1'/
 cpu id='11' socket_id='1' core_id='1'/
   /cpus
 /cell
 cell id='3'
   cpus num='2'
 cpu id='12' socket_id='1' core_id='0'/
 cpu id='13' socket_id='1' core_id='0'/
 cpu id='14' socket_id='1' core_id='1'/
 cpu id='15' socket_id='1' core_id='1'/
   /cpus
 /cell
   /cells
 /topology

I believe there's enough info there to determine all the co-location
aspects of all the sockets/core/threads involved.


Well not for all machines in the wild out there. This is a very similar 
approach that libvirt uses now to detect the topology and it is not 
enough to detect threads on AMD Bulldozer as