Re: [libvirt] [RFC] Data in the topology element in the capabilities XML
On Thu, Jan 17, 2013 at 12:12:35AM +0100, Peter Krempa wrote: On 01/16/13 21:24, Daniel P. Berrange wrote: On Wed, Jan 16, 2013 at 05:06:21PM -0300, Amador Pahim wrote: On 01/16/2013 04:30 PM, Daniel P. Berrange wrote: On Wed, Jan 16, 2013 at 02:15:37PM -0500, Peter Krempa wrote: - Original Message - From: Daniel P. Berrange berra...@redhat.com To: Peter Krempa pkre...@redhat.com Cc: Jiri Denemark jdene...@redhat.com, Amador Pahim apa...@redhat.com, libvirt-l...@redhat.com, dougsl...@redhat.com Sent: Wed, 16 Jan 2013 13:39:28 -0500 (EST) Subject: Re: [libvirt] [RFC] Data in the topology element in the capabilities XML On Wed, Jan 16, 2013 at 07:31:02PM +0100, Peter Krempa wrote: On 01/16/13 19:11, Daniel P. Berrange wrote: On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote: Hi everybody, a while ago there was a discussion about changing the data that is returned in the topology sub-element: capabilities host cpu archx86_64/arch modelSandyBridge/model vendorIntel/vendor topology sockets='1' cores='2' threads='2'/ The data provided here is as of today taken from the nodeinfo detection code and thus is really wrong when the fallback mechanisms are used. To get a useful count, the user has to multiply the data by the number of NUMA nodes in the host. With the fallback detection code used for nodeinfo the NUMA node count used to get the CPU count should be 1 instead of the actual number. As Jiri proposed, I think we should change this output to separate detection code that will not take into account NUMA nodes for this output and will rather provide data as the lspci command does. This change will make the data provided by the element standalone and also usable in guest XMLs to mirror host's topology. Well there are 2 parts which need to be considered here. What do we report in the host capabilities, and how do you configure guest XML. From a historical compatibility pov I don't think we should be changing the host capabilities at all. Simply document that 'sockets' is treated as sockets-per-node everywhere, and that it is wrong in the case of machines where an socket can internally have multiple NUMA nodes. I'm too somewhat concerned about changing this output due to historic reasons. Apps should be using the separate NUMA topology data in the capabilities instead of the CPU topology data, to get accurate CPU counts. From the NUMA topology the management apps can't tell if the CPU is a core or a thread. For example oVirt/VDSM bases the decisions on this information. Then, we should add information to the NUMA topology XML to indicate which of the child cpu elements are sibling cores or threads. Perhaps add a 'socket_id' + 'core_id' attribute to every cpu. In this case, we will also need to add the thread siblings and perhaps even core siblings information to allow reliable detection. The combination fo core_id/socket_id lets you determine that. If two core have the same socket_id then they are cores or threads within the same socket. If two cpu have the same socket_id core_id then they are threads within the same core. Not true to AMD Magny-Cours 6100 series, where different cores can share the same physical_id and core_id. And they are not threads. This processors has two numa nodes inside the same package (aka socket) and they shares the same core ID set. Annoying. I don't believe there's a problem with that. This example XML shows a machine with 4 NUMA nodes, 2 sockets each containing 2 cores, and 2 threads, giving 16 logical CPUs topology cells num='4' cell id='0' cpus num='4' cpu id='0' socket_id='0' core_id='0'/ cpu id='1' socket_id='0' core_id='0'/ cpu id='2' socket_id='0' core_id='1'/ cpu id='3' socket_id='0' core_id='1'/ /cpus /cell cell id='1' cpus num='2' cpu id='4' socket_id='0' core_id='0'/ cpu id='5' socket_id='0' core_id='0'/ cpu id='6' socket_id='0' core_id='1'/ cpu id='7' socket_id='0' core_id='1'/ /cpus /cell cell id='2' cpus num='2' cpu id='8' socket_id='1' core_id='0'/ cpu id='9' socket_id='1' core_id='0'/ cpu id='10' socket_id='1' core_id='1'/ cpu id='11' socket_id='1' core_id='1'/ /cpus /cell cell id='3' cpus num='2' cpu id='12' socket_id='1' core_id='0'/ cpu id='13' socket_id='1' core_id='0'/ cpu id='14' socket_id='1' core_id='1'/ cpu id='15' socket_id='1' core_id='1'/ /cpus /cell /cells /topology I believe there's enough info there to determine all the co-location aspects of all the sockets/core/threads involved. Well not for all
Re: [libvirt] [RFC] Data in the topology element in the capabilities XML
On 01/17/13 10:36, Daniel P. Berrange wrote: Well not for all machines in the wild out there. This is a very similar approach that libvirt uses now to detect the topology and it is not enough to detect threads on AMD Bulldozer as the cpus corresponding to the threads have different core_id's (they are also considered as cores from the perspective of the kernel). This is unfortunate for the virtualization management tools as oVirt that still consider the AMD Bulldozer module as a 1 core with two threads, even if it registers as two cores. For AMD Bulldozer to be detected correctly, we would need to expose the thread_id's along with thread siblings information to determine the two threads belonging together. NB, the socket_id / core_id values in the above XML are *not* intended to be anyway related to similarly named values in /proc/cpuinfo. They are values libvirt assigns to show the topology accurately. Hm, in that case I'm not sure if it's worth bothering with detecting the topology, then (possibly) making up the data provided in the XML. The management applications then will have to calculate the topology from our data again just to know the topology. For other possible uses of the data the management app would still need to re-detect it on their own if they need (for some reason) the actual data. Also, even with non-real data in these fields here it won't enable us to reflect the topology of AMD Bulldozer reliably. We would have to choose whether we'd like to report the modules as cores or threads. And each of these choices will possibly make someone complain that they don't like the choice. Peter Daniel -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] Data in the topology element in the capabilities XML
On Thu, Jan 17, 2013 at 03:41:44PM +0100, Peter Krempa wrote: On 01/17/13 10:36, Daniel P. Berrange wrote: Well not for all machines in the wild out there. This is a very similar approach that libvirt uses now to detect the topology and it is not enough to detect threads on AMD Bulldozer as the cpus corresponding to the threads have different core_id's (they are also considered as cores from the perspective of the kernel). This is unfortunate for the virtualization management tools as oVirt that still consider the AMD Bulldozer module as a 1 core with two threads, even if it registers as two cores. For AMD Bulldozer to be detected correctly, we would need to expose the thread_id's along with thread siblings information to determine the two threads belonging together. NB, the socket_id / core_id values in the above XML are *not* intended to be anyway related to similarly named values in /proc/cpuinfo. They are values libvirt assigns to show the topology accurately. Hm, in that case I'm not sure if it's worth bothering with detecting the topology, then (possibly) making up the data provided in the XML. The management applications then will have to calculate the topology from our data again just to know the topology. For other possible uses of the data the management app would still need to re-detect it on their own if they need (for some reason) the actual data. The point is that libvirt is intended to provide a representation of the data that is independant of the underlying technology. Apps using libvirt can't expect to read /proc/cpuinfo directly and then expect libvirt to match that exactly. They should be using the libvirt APIs/XML for this exclusively if there is something missing which prevents them doing this, then we need to add it. Also, even with non-real data in these fields here it won't enable us to reflect the topology of AMD Bulldozer reliably. We would have to choose whether we'd like to report the modules as cores or threads. And each of these choices will possibly make someone complain that they don't like the choice. I think you're creating a problem where none exists - there is a clear difference between what is a hyperthread vs what is a core, so there is no ambiguity in what we choose to use. We must simply pick the one that is right. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] Data in the topology element in the capabilities XML
On 01/17/13 15:48, Daniel P. Berrange wrote: On Thu, Jan 17, 2013 at 03:41:44PM +0100, Peter Krempa wrote: On 01/17/13 10:36, Daniel P. Berrange wrote: Well not for all machines in the wild out there. This is a very similar approach that libvirt uses now to detect the topology and it is not enough to detect threads on AMD Bulldozer as the cpus corresponding to the threads have different core_id's (they are also considered as cores from the perspective of the kernel). This is unfortunate for the virtualization management tools as oVirt that still consider the AMD Bulldozer module as a 1 core with two threads, even if it registers as two cores. For AMD Bulldozer to be detected correctly, we would need to expose the thread_id's along with thread siblings information to determine the two threads belonging together. NB, the socket_id / core_id values in the above XML are *not* intended to be anyway related to similarly named values in /proc/cpuinfo. They are values libvirt assigns to show the topology accurately. Hm, in that case I'm not sure if it's worth bothering with detecting the topology, then (possibly) making up the data provided in the XML. The management applications then will have to calculate the topology from our data again just to know the topology. For other possible uses of the data the management app would still need to re-detect it on their own if they need (for some reason) the actual data. The point is that libvirt is intended to provide a representation of the data that is independant of the underlying technology. Apps using libvirt can't expect to read /proc/cpuinfo directly and then expect libvirt to match that exactly. They should be using the libvirt APIs/XML for this exclusively if there is something missing which prevents them doing this, then we need to add it. Okay, this is fair enough and might work even for non-symetric multiprocessor systems. Also, even with non-real data in these fields here it won't enable us to reflect the topology of AMD Bulldozer reliably. We would have to choose whether we'd like to report the modules as cores or threads. And each of these choices will possibly make someone complain that they don't like the choice. I think you're creating a problem where none exists - there is a clear difference between what is a hyperthread vs what is a core, so there is no ambiguity in what we choose to use. We must simply pick the one that is right. I beg to differ: This technology is informally called CMT (Clustered Multi-Thread) and formally called module by the AMD's marketing service, but whose real name is Clustered Integer Core Technology. In terms of hardware complexity and functionality, this module is midway between a true dual-core processor and its integer power (each thread having a fully independent integer core) and a single core processor that has the SMT ability, which can create a dual threads processor but with the power of one (each thread shares the resources of the core with the other thread). (source: http://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)#BULLDOZER_Core_.28aka_Module.29 ) oVirt/vdsm considers the Module as a core with two threads, where others consider it more as separate cores. The performance depends on the type of task that is being run on the module. The kernel is also using an ambiguous description where the core ID's of the two threads of a module are different and also the thread siblings field is filled out targeting each other. Peter Daniel -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] Data in the topology element in the capabilities XML
On 01/17/13 16:43, Daniel P. Berrange wrote: I beg to differ: This technology is informally called CMT (Clustered Multi-Thread) and formally called module by the AMD's marketing service, but whose real name is Clustered Integer Core Technology. In terms of hardware complexity and functionality, this module is midway between a true dual-core processor and its integer power (each thread having a fully independent integer core) and a single core processor that has the SMT ability, which can create a dual threads processor but with the power of one (each thread shares the resources of the core with the other thread). (source: http://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)#BULLDOZER_Core_.28aka_Module.29 ) oVirt/vdsm considers the Module as a core with two threads, where others consider it more as separate cores. The performance depends on the type of task that is being run on the module. So what information does VDSM use to identify this hardware topology. VDSM uses the topology element (subelement of cpu) in the capabilities and multiplies the numbers by the number of numa nodes. As the data there is taken from nodeinfo, this breaks on some systems. In some form or another we need to be providing the more detailed and accurate topology infomation in the capabilities NUMA description. What does 'hwloc --no-io --no-caches' shown on a Bulldozer machine ? # hwloc-info --no-io --no-caches depth 0:1 Machine (type #1) depth 1: 2 Socket (type #3) depth 2: 4 NUMANode (type #2) depth 3: 32 Core (type #5) depth 4:32 PU (type #6) hwloc recognizes it as cores ... # lscpu Architecture: x86_64 CPU op-mode(s):32-bit, 64-bit Byte Order:Little Endian CPU(s):32 On-line CPU(s) list: 0-31 Thread(s) per core:2 Core(s) per socket:8 Socket(s): 2 ... lscpu as as threads ... NUMA node(s): 4 Vendor ID: AuthenticAMD CPU family:21 Model: 1 Stepping: 2 CPU MHz: 1400.000 BogoMIPS: 4389.80 Virtualization:AMD-V L1d cache: 16K L1i cache: 64K L2 cache: 2048K L3 cache: 6144K NUMA node0 CPU(s): 0-7 NUMA node1 CPU(s): 8-15 NUMA node2 CPU(s): 16-23 NUMA node3 CPU(s): 24-31 Daniel -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] Data in the topology element in the capabilities XML
On Thu, Jan 17, 2013 at 05:14:30PM +0100, Peter Krempa wrote: On 01/17/13 16:43, Daniel P. Berrange wrote: VDSM uses the topology element (subelement of cpu) in the capabilities and multiplies the numbers by the number of numa nodes. As the data there is taken from nodeinfo, this breaks on some systems. If it is merely looking at the current CPU topology then I see no reason why we can't make enough information available in the NUMA topology to let it do the right thing on all systems. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] Data in the topology element in the capabilities XML
On 01/17/13 17:56, Daniel P. Berrange wrote: On Thu, Jan 17, 2013 at 05:14:30PM +0100, Peter Krempa wrote: On 01/17/13 16:43, Daniel P. Berrange wrote: VDSM uses the topology element (subelement of cpu) in the capabilities and multiplies the numbers by the number of numa nodes. As the data there is taken from nodeinfo, this breaks on some systems. If it is merely looking at the current CPU topology then I see no reason why we can't make enough information available in the NUMA topology to let it do the right thing on all systems. We definitely can. I'm not against it, I just think it's a pretty heavy hammer for the problem. But as long as it provides reasonable data I don't care. The thing I care about is providing reasonable data so that the AMD Bulldozer CPU can be detected in either way depending on the choice of the management app. Regards, Daniel -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [RFC] Data in the topology element in the capabilities XML
Hi everybody, a while ago there was a discussion about changing the data that is returned in the topology sub-element: capabilities host cpu archx86_64/arch modelSandyBridge/model vendorIntel/vendor topology sockets='1' cores='2' threads='2'/ The data provided here is as of today taken from the nodeinfo detection code and thus is really wrong when the fallback mechanisms are used. To get a useful count, the user has to multiply the data by the number of NUMA nodes in the host. With the fallback detection code used for nodeinfo the NUMA node count used to get the CPU count should be 1 instead of the actual number. As Jiri proposed, I think we should change this output to separate detection code that will not take into account NUMA nodes for this output and will rather provide data as the lspci command does. This change will make the data provided by the element standalone and also usable in guest XMLs to mirror host's topology. The meaning of the attributes of that element isn't really documented anywhere, so as an additional precaution we should document that. ( http://libvirt.org/formatcaps.html ) For some additional background, please refer to the original discussion here: https://www.redhat.com/archives/libvir-list/2012-March/msg01123.html Thanks in advance for your comments and suggestions. Peter -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] Data in the topology element in the capabilities XML
On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote: Hi everybody, a while ago there was a discussion about changing the data that is returned in the topology sub-element: capabilities host cpu archx86_64/arch modelSandyBridge/model vendorIntel/vendor topology sockets='1' cores='2' threads='2'/ The data provided here is as of today taken from the nodeinfo detection code and thus is really wrong when the fallback mechanisms are used. To get a useful count, the user has to multiply the data by the number of NUMA nodes in the host. With the fallback detection code used for nodeinfo the NUMA node count used to get the CPU count should be 1 instead of the actual number. As Jiri proposed, I think we should change this output to separate detection code that will not take into account NUMA nodes for this output and will rather provide data as the lspci command does. This change will make the data provided by the element standalone and also usable in guest XMLs to mirror host's topology. Well there are 2 parts which need to be considered here. What do we report in the host capabilities, and how do you configure guest XML. From a historical compatibility pov I don't think we should be changing the host capabilities at all. Simply document that 'sockets' is treated as sockets-per-node everywhere, and that it is wrong in the case of machines where an socket can internally have multiple NUMA nodes. Apps should be using the separate NUMA topology data in the capabilities instead of the CPU topology data, to get accurate CPU counts. For the guest there are two cases to consider. If there is no NUMA in the guest there is no problem, because total sockets and sockets per node are the same. In the case where there is NUMA set, we should just ignore the guest 'sockets' attribute completely, and treat the 'cores' 'threads' attributes and vcpu and numa elements as providing canonical data. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] Data in the topology element in the capabilities XML
On 01/16/13 19:11, Daniel P. Berrange wrote: On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote: Hi everybody, a while ago there was a discussion about changing the data that is returned in the topology sub-element: capabilities host cpu archx86_64/arch modelSandyBridge/model vendorIntel/vendor topology sockets='1' cores='2' threads='2'/ The data provided here is as of today taken from the nodeinfo detection code and thus is really wrong when the fallback mechanisms are used. To get a useful count, the user has to multiply the data by the number of NUMA nodes in the host. With the fallback detection code used for nodeinfo the NUMA node count used to get the CPU count should be 1 instead of the actual number. As Jiri proposed, I think we should change this output to separate detection code that will not take into account NUMA nodes for this output and will rather provide data as the lspci command does. This change will make the data provided by the element standalone and also usable in guest XMLs to mirror host's topology. Well there are 2 parts which need to be considered here. What do we report in the host capabilities, and how do you configure guest XML. From a historical compatibility pov I don't think we should be changing the host capabilities at all. Simply document that 'sockets' is treated as sockets-per-node everywhere, and that it is wrong in the case of machines where an socket can internally have multiple NUMA nodes. I'm too somewhat concerned about changing this output due to historic reasons. Apps should be using the separate NUMA topology data in the capabilities instead of the CPU topology data, to get accurate CPU counts. From the NUMA topology the management apps can't tell if the CPU is a core or a thread. For example oVirt/VDSM bases the decisions on this information. The management apps tend to avoid using cores as CPUs for the guests for performance reasons. Any other ideas how to provide this kind of information to the mgmt apps? For the guest there are two cases to consider. If there is no NUMA in the guest there is no problem, because total sockets and sockets per node are the same. In the case where there is NUMA set, we should just ignore the guest 'sockets' attribute completely, and treat the 'cores' 'threads' attributes and vcpu and numa elements as providing canonical data. Daniel -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] Data in the topology element in the capabilities XML
On Wed, Jan 16, 2013 at 07:31:02PM +0100, Peter Krempa wrote: On 01/16/13 19:11, Daniel P. Berrange wrote: On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote: Hi everybody, a while ago there was a discussion about changing the data that is returned in the topology sub-element: capabilities host cpu archx86_64/arch modelSandyBridge/model vendorIntel/vendor topology sockets='1' cores='2' threads='2'/ The data provided here is as of today taken from the nodeinfo detection code and thus is really wrong when the fallback mechanisms are used. To get a useful count, the user has to multiply the data by the number of NUMA nodes in the host. With the fallback detection code used for nodeinfo the NUMA node count used to get the CPU count should be 1 instead of the actual number. As Jiri proposed, I think we should change this output to separate detection code that will not take into account NUMA nodes for this output and will rather provide data as the lspci command does. This change will make the data provided by the element standalone and also usable in guest XMLs to mirror host's topology. Well there are 2 parts which need to be considered here. What do we report in the host capabilities, and how do you configure guest XML. From a historical compatibility pov I don't think we should be changing the host capabilities at all. Simply document that 'sockets' is treated as sockets-per-node everywhere, and that it is wrong in the case of machines where an socket can internally have multiple NUMA nodes. I'm too somewhat concerned about changing this output due to historic reasons. Apps should be using the separate NUMA topology data in the capabilities instead of the CPU topology data, to get accurate CPU counts. From the NUMA topology the management apps can't tell if the CPU is a core or a thread. For example oVirt/VDSM bases the decisions on this information. Then, we should add information to the NUMA topology XML to indicate which of the child cpu elements are sibling cores or threads. Perhaps add a 'socket_id' + 'core_id' attribute to every cpu. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] Data in the topology element in the capabilities XML
- Original Message - From: Daniel P. Berrange berra...@redhat.com To: Peter Krempa pkre...@redhat.com Cc: Jiri Denemark jdene...@redhat.com, Amador Pahim apa...@redhat.com, libvirt-l...@redhat.com, dougsl...@redhat.com Sent: Wed, 16 Jan 2013 13:39:28 -0500 (EST) Subject: Re: [libvirt] [RFC] Data in the topology element in the capabilities XML On Wed, Jan 16, 2013 at 07:31:02PM +0100, Peter Krempa wrote: On 01/16/13 19:11, Daniel P. Berrange wrote: On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote: Hi everybody, a while ago there was a discussion about changing the data that is returned in the topology sub-element: capabilities host cpu archx86_64/arch modelSandyBridge/model vendorIntel/vendor topology sockets='1' cores='2' threads='2'/ The data provided here is as of today taken from the nodeinfo detection code and thus is really wrong when the fallback mechanisms are used. To get a useful count, the user has to multiply the data by the number of NUMA nodes in the host. With the fallback detection code used for nodeinfo the NUMA node count used to get the CPU count should be 1 instead of the actual number. As Jiri proposed, I think we should change this output to separate detection code that will not take into account NUMA nodes for this output and will rather provide data as the lspci command does. This change will make the data provided by the element standalone and also usable in guest XMLs to mirror host's topology. Well there are 2 parts which need to be considered here. What do we report in the host capabilities, and how do you configure guest XML. From a historical compatibility pov I don't think we should be changing the host capabilities at all. Simply document that 'sockets' is treated as sockets-per-node everywhere, and that it is wrong in the case of machines where an socket can internally have multiple NUMA nodes. I'm too somewhat concerned about changing this output due to historic reasons. Apps should be using the separate NUMA topology data in the capabilities instead of the CPU topology data, to get accurate CPU counts. From the NUMA topology the management apps can't tell if the CPU is a core or a thread. For example oVirt/VDSM bases the decisions on this information. Then, we should add information to the NUMA topology XML to indicate which of the child cpu elements are sibling cores or threads. Perhaps add a 'socket_id' + 'core_id' attribute to every cpu. In this case, we will also need to add the thread siblings and perhaps even core siblings information to allow reliable detection. Peter Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] Data in the topology element in the capabilities XML
On Wed, Jan 16, 2013 at 02:15:37PM -0500, Peter Krempa wrote: - Original Message - From: Daniel P. Berrange berra...@redhat.com To: Peter Krempa pkre...@redhat.com Cc: Jiri Denemark jdene...@redhat.com, Amador Pahim apa...@redhat.com, libvirt-l...@redhat.com, dougsl...@redhat.com Sent: Wed, 16 Jan 2013 13:39:28 -0500 (EST) Subject: Re: [libvirt] [RFC] Data in the topology element in the capabilities XML On Wed, Jan 16, 2013 at 07:31:02PM +0100, Peter Krempa wrote: On 01/16/13 19:11, Daniel P. Berrange wrote: On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote: Hi everybody, a while ago there was a discussion about changing the data that is returned in the topology sub-element: capabilities host cpu archx86_64/arch modelSandyBridge/model vendorIntel/vendor topology sockets='1' cores='2' threads='2'/ The data provided here is as of today taken from the nodeinfo detection code and thus is really wrong when the fallback mechanisms are used. To get a useful count, the user has to multiply the data by the number of NUMA nodes in the host. With the fallback detection code used for nodeinfo the NUMA node count used to get the CPU count should be 1 instead of the actual number. As Jiri proposed, I think we should change this output to separate detection code that will not take into account NUMA nodes for this output and will rather provide data as the lspci command does. This change will make the data provided by the element standalone and also usable in guest XMLs to mirror host's topology. Well there are 2 parts which need to be considered here. What do we report in the host capabilities, and how do you configure guest XML. From a historical compatibility pov I don't think we should be changing the host capabilities at all. Simply document that 'sockets' is treated as sockets-per-node everywhere, and that it is wrong in the case of machines where an socket can internally have multiple NUMA nodes. I'm too somewhat concerned about changing this output due to historic reasons. Apps should be using the separate NUMA topology data in the capabilities instead of the CPU topology data, to get accurate CPU counts. From the NUMA topology the management apps can't tell if the CPU is a core or a thread. For example oVirt/VDSM bases the decisions on this information. Then, we should add information to the NUMA topology XML to indicate which of the child cpu elements are sibling cores or threads. Perhaps add a 'socket_id' + 'core_id' attribute to every cpu. In this case, we will also need to add the thread siblings and perhaps even core siblings information to allow reliable detection. The combination fo core_id/socket_id lets you determine that. If two core have the same socket_id then they are cores or threads within the same socket. If two cpu have the same socket_id core_id then they are threads within the same core. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] Data in the topology element in the capabilities XML
On 01/16/2013 04:30 PM, Daniel P. Berrange wrote: On Wed, Jan 16, 2013 at 02:15:37PM -0500, Peter Krempa wrote: - Original Message - From: Daniel P. Berrange berra...@redhat.com To: Peter Krempa pkre...@redhat.com Cc: Jiri Denemark jdene...@redhat.com, Amador Pahim apa...@redhat.com, libvirt-l...@redhat.com, dougsl...@redhat.com Sent: Wed, 16 Jan 2013 13:39:28 -0500 (EST) Subject: Re: [libvirt] [RFC] Data in the topology element in the capabilities XML On Wed, Jan 16, 2013 at 07:31:02PM +0100, Peter Krempa wrote: On 01/16/13 19:11, Daniel P. Berrange wrote: On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote: Hi everybody, a while ago there was a discussion about changing the data that is returned in the topology sub-element: capabilities host cpu archx86_64/arch modelSandyBridge/model vendorIntel/vendor topology sockets='1' cores='2' threads='2'/ The data provided here is as of today taken from the nodeinfo detection code and thus is really wrong when the fallback mechanisms are used. To get a useful count, the user has to multiply the data by the number of NUMA nodes in the host. With the fallback detection code used for nodeinfo the NUMA node count used to get the CPU count should be 1 instead of the actual number. As Jiri proposed, I think we should change this output to separate detection code that will not take into account NUMA nodes for this output and will rather provide data as the lspci command does. This change will make the data provided by the element standalone and also usable in guest XMLs to mirror host's topology. Well there are 2 parts which need to be considered here. What do we report in the host capabilities, and how do you configure guest XML. From a historical compatibility pov I don't think we should be changing the host capabilities at all. Simply document that 'sockets' is treated as sockets-per-node everywhere, and that it is wrong in the case of machines where an socket can internally have multiple NUMA nodes. I'm too somewhat concerned about changing this output due to historic reasons. Apps should be using the separate NUMA topology data in the capabilities instead of the CPU topology data, to get accurate CPU counts. From the NUMA topology the management apps can't tell if the CPU is a core or a thread. For example oVirt/VDSM bases the decisions on this information. Then, we should add information to the NUMA topology XML to indicate which of the child cpu elements are sibling cores or threads. Perhaps add a 'socket_id' + 'core_id' attribute to every cpu. In this case, we will also need to add the thread siblings and perhaps even core siblings information to allow reliable detection. The combination fo core_id/socket_id lets you determine that. If two core have the same socket_id then they are cores or threads within the same socket. If two cpu have the same socket_id core_id then they are threads within the same core. Not true to AMD Magny-Cours 6100 series, where different cores can share the same physical_id and core_id. And they are not threads. This processors has two numa nodes inside the same package (aka socket) and they shares the same core ID set. Annoying. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] Data in the topology element in the capabilities XML
On Wed, Jan 16, 2013 at 05:06:21PM -0300, Amador Pahim wrote: On 01/16/2013 04:30 PM, Daniel P. Berrange wrote: On Wed, Jan 16, 2013 at 02:15:37PM -0500, Peter Krempa wrote: - Original Message - From: Daniel P. Berrange berra...@redhat.com To: Peter Krempa pkre...@redhat.com Cc: Jiri Denemark jdene...@redhat.com, Amador Pahim apa...@redhat.com, libvirt-l...@redhat.com, dougsl...@redhat.com Sent: Wed, 16 Jan 2013 13:39:28 -0500 (EST) Subject: Re: [libvirt] [RFC] Data in the topology element in the capabilities XML On Wed, Jan 16, 2013 at 07:31:02PM +0100, Peter Krempa wrote: On 01/16/13 19:11, Daniel P. Berrange wrote: On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote: Hi everybody, a while ago there was a discussion about changing the data that is returned in the topology sub-element: capabilities host cpu archx86_64/arch modelSandyBridge/model vendorIntel/vendor topology sockets='1' cores='2' threads='2'/ The data provided here is as of today taken from the nodeinfo detection code and thus is really wrong when the fallback mechanisms are used. To get a useful count, the user has to multiply the data by the number of NUMA nodes in the host. With the fallback detection code used for nodeinfo the NUMA node count used to get the CPU count should be 1 instead of the actual number. As Jiri proposed, I think we should change this output to separate detection code that will not take into account NUMA nodes for this output and will rather provide data as the lspci command does. This change will make the data provided by the element standalone and also usable in guest XMLs to mirror host's topology. Well there are 2 parts which need to be considered here. What do we report in the host capabilities, and how do you configure guest XML. From a historical compatibility pov I don't think we should be changing the host capabilities at all. Simply document that 'sockets' is treated as sockets-per-node everywhere, and that it is wrong in the case of machines where an socket can internally have multiple NUMA nodes. I'm too somewhat concerned about changing this output due to historic reasons. Apps should be using the separate NUMA topology data in the capabilities instead of the CPU topology data, to get accurate CPU counts. From the NUMA topology the management apps can't tell if the CPU is a core or a thread. For example oVirt/VDSM bases the decisions on this information. Then, we should add information to the NUMA topology XML to indicate which of the child cpu elements are sibling cores or threads. Perhaps add a 'socket_id' + 'core_id' attribute to every cpu. In this case, we will also need to add the thread siblings and perhaps even core siblings information to allow reliable detection. The combination fo core_id/socket_id lets you determine that. If two core have the same socket_id then they are cores or threads within the same socket. If two cpu have the same socket_id core_id then they are threads within the same core. Not true to AMD Magny-Cours 6100 series, where different cores can share the same physical_id and core_id. And they are not threads. This processors has two numa nodes inside the same package (aka socket) and they shares the same core ID set. Annoying. I don't believe there's a problem with that. This example XML shows a machine with 4 NUMA nodes, 2 sockets each containing 2 cores, and 2 threads, giving 16 logical CPUs topology cells num='4' cell id='0' cpus num='4' cpu id='0' socket_id='0' core_id='0'/ cpu id='1' socket_id='0' core_id='0'/ cpu id='2' socket_id='0' core_id='1'/ cpu id='3' socket_id='0' core_id='1'/ /cpus /cell cell id='1' cpus num='2' cpu id='4' socket_id='0' core_id='0'/ cpu id='5' socket_id='0' core_id='0'/ cpu id='6' socket_id='0' core_id='1'/ cpu id='7' socket_id='0' core_id='1'/ /cpus /cell cell id='2' cpus num='2' cpu id='8' socket_id='1' core_id='0'/ cpu id='9' socket_id='1' core_id='0'/ cpu id='10' socket_id='1' core_id='1'/ cpu id='11' socket_id='1' core_id='1'/ /cpus /cell cell id='3' cpus num='2' cpu id='12' socket_id='1' core_id='0'/ cpu id='13' socket_id='1' core_id='0'/ cpu id='14' socket_id='1' core_id='1'/ cpu id='15' socket_id='1' core_id='1'/ /cpus /cell /cells /topology I believe there's enough info there to determine all the co-location aspects of all the sockets/core/threads involved. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o-
Re: [libvirt] [RFC] Data in the topology element in the capabilities XML
On 01/16/13 21:24, Daniel P. Berrange wrote: On Wed, Jan 16, 2013 at 05:06:21PM -0300, Amador Pahim wrote: On 01/16/2013 04:30 PM, Daniel P. Berrange wrote: On Wed, Jan 16, 2013 at 02:15:37PM -0500, Peter Krempa wrote: - Original Message - From: Daniel P. Berrange berra...@redhat.com To: Peter Krempa pkre...@redhat.com Cc: Jiri Denemark jdene...@redhat.com, Amador Pahim apa...@redhat.com, libvirt-l...@redhat.com, dougsl...@redhat.com Sent: Wed, 16 Jan 2013 13:39:28 -0500 (EST) Subject: Re: [libvirt] [RFC] Data in the topology element in the capabilities XML On Wed, Jan 16, 2013 at 07:31:02PM +0100, Peter Krempa wrote: On 01/16/13 19:11, Daniel P. Berrange wrote: On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote: Hi everybody, a while ago there was a discussion about changing the data that is returned in the topology sub-element: capabilities host cpu archx86_64/arch modelSandyBridge/model vendorIntel/vendor topology sockets='1' cores='2' threads='2'/ The data provided here is as of today taken from the nodeinfo detection code and thus is really wrong when the fallback mechanisms are used. To get a useful count, the user has to multiply the data by the number of NUMA nodes in the host. With the fallback detection code used for nodeinfo the NUMA node count used to get the CPU count should be 1 instead of the actual number. As Jiri proposed, I think we should change this output to separate detection code that will not take into account NUMA nodes for this output and will rather provide data as the lspci command does. This change will make the data provided by the element standalone and also usable in guest XMLs to mirror host's topology. Well there are 2 parts which need to be considered here. What do we report in the host capabilities, and how do you configure guest XML. From a historical compatibility pov I don't think we should be changing the host capabilities at all. Simply document that 'sockets' is treated as sockets-per-node everywhere, and that it is wrong in the case of machines where an socket can internally have multiple NUMA nodes. I'm too somewhat concerned about changing this output due to historic reasons. Apps should be using the separate NUMA topology data in the capabilities instead of the CPU topology data, to get accurate CPU counts. From the NUMA topology the management apps can't tell if the CPU is a core or a thread. For example oVirt/VDSM bases the decisions on this information. Then, we should add information to the NUMA topology XML to indicate which of the child cpu elements are sibling cores or threads. Perhaps add a 'socket_id' + 'core_id' attribute to every cpu. In this case, we will also need to add the thread siblings and perhaps even core siblings information to allow reliable detection. The combination fo core_id/socket_id lets you determine that. If two core have the same socket_id then they are cores or threads within the same socket. If two cpu have the same socket_id core_id then they are threads within the same core. Not true to AMD Magny-Cours 6100 series, where different cores can share the same physical_id and core_id. And they are not threads. This processors has two numa nodes inside the same package (aka socket) and they shares the same core ID set. Annoying. I don't believe there's a problem with that. This example XML shows a machine with 4 NUMA nodes, 2 sockets each containing 2 cores, and 2 threads, giving 16 logical CPUs topology cells num='4' cell id='0' cpus num='4' cpu id='0' socket_id='0' core_id='0'/ cpu id='1' socket_id='0' core_id='0'/ cpu id='2' socket_id='0' core_id='1'/ cpu id='3' socket_id='0' core_id='1'/ /cpus /cell cell id='1' cpus num='2' cpu id='4' socket_id='0' core_id='0'/ cpu id='5' socket_id='0' core_id='0'/ cpu id='6' socket_id='0' core_id='1'/ cpu id='7' socket_id='0' core_id='1'/ /cpus /cell cell id='2' cpus num='2' cpu id='8' socket_id='1' core_id='0'/ cpu id='9' socket_id='1' core_id='0'/ cpu id='10' socket_id='1' core_id='1'/ cpu id='11' socket_id='1' core_id='1'/ /cpus /cell cell id='3' cpus num='2' cpu id='12' socket_id='1' core_id='0'/ cpu id='13' socket_id='1' core_id='0'/ cpu id='14' socket_id='1' core_id='1'/ cpu id='15' socket_id='1' core_id='1'/ /cpus /cell /cells /topology I believe there's enough info there to determine all the co-location aspects of all the sockets/core/threads involved. Well not for all machines in the wild out there. This is a very similar approach that libvirt uses now to detect the topology and it is not enough to detect threads on AMD Bulldozer as