Re: [Ganglia-developers] Correct counting of CPUs, Cores, Siblings (bz #84)
On Fri, Jan 05, 2007 at 05:23:47PM -, [EMAIL PROTECTED] wrote: Also remember about the cygwin agent build, which also processes from cygwin's /proc. the cpu_num metric was hardcoded to return 1 until 3.0.5, so that one will change regardless, but this time it will show the right number of CPUs instead (using a native windows call). the percentage values which use /proc/stat are independent of the number of CPUs and therefore will be unaffected, and well the cpu load metrics are broken regardless anyway. Carlo
Re: [Ganglia-developers] Correct counting of CPUs, Cores, Siblings (bz #84)
--- Carlo Marcelo Arenas Belon [EMAIL PROTECTED] wrote: On Fri, Dec 22, 2006 at 08:05:02AM -0800, Martin Knoblauch wrote: Hi Folks, in order to fix bz#84 for Linux. http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=84 I think that the fix for this bug should actually include adding 2 more metrics, as the problem as stated isn't really that ganglia isn't reporting the right count of CPUs, but that there is no way to know if it is virtual or real CPUs for inventory and in some sort also scheduling reasons. This way cpu_num could be kept as the number of available CPUs, as is implicitly described to do in the current documentation for this metric and will have cpu_cores and cpu_sockets as the number of available cores or available sockets. of course for HPC, the number of effective CPUs is a function of all those 3 and the type of code that is being run, so we should leave up to the end users to figure that out while giving them all the information they need for that. the advantages of doing it this way, are that the code is greatly simplified, all possible use cases are covered and the metric is kept backward compatible. comments, anyone? Carlo Carlo, modulo the naming of the new metrics, I completely agree with you. In order to make an educated guess, we need all three components. And we should not forget that more and more clusters in use are running non-HPC workloads, where the virtual CPUs may actually be of use. One thing we should at least keep in mind is the fact the number of CPUs may no longer be a constant - CPU hotplugging is available on Linux and some of the proprietary Unixes. Same for memeory. And the cpu-frequency has been variable for years. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de
Re: [Ganglia-developers] Correct counting of CPUs, Cores, Siblings (bz #84)
On Wed, Dec 27, 2006 at 03:12:51AM -0800, Martin Knoblauch wrote: What I now need is the output from 2.4 based configs. Only multi-core and/or HT-enabled systems actually. AFAIK in vanilla kernel 2.4 there is no way to tell between virtual and real CPUs from /proc/cpuinfo as can be seen by the following examples : HW: Dual CPU 2.66GHz, HT enabled SW: MDK 9.1 with a recompiled 2.4.21 kernel with ACPI_HT_ONLY=y processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 2.66GHz stepping: 9 cpu MHz : 2665.947 cache size : 512 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm bogomips: 5321.52 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 2.66GHz stepping: 9 cpu MHz : 2665.947 cache size : 512 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm bogomips: 5321.52 processor : 2 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 2.66GHz stepping: 9 cpu MHz : 2665.947 cache size : 512 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm bogomips: 5321.52 processor : 3 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 2.66GHz stepping: 9 cpu MHz : 2665.947 cache size : 512 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm bogomips: 5321.52 HW: Dual CPU 2.8GHz, HT enabled SW: MDK 9.1 with a custom compiled 2.4.32 kernel and booted with acpi=ht processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 2.80GHz stepping: 10 cpu MHz : 2800.242 cache size : 16 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm pni monitor ds_cpl cid bogomips: 5583.66 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 2.80GHz stepping: 10 cpu MHz : 2800.242 cache size : 16 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm pni monitor ds_cpl cid bogomips: 5596.77 processor : 2 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 2.80GHz stepping: 10 cpu MHz : 2800.242 cache size : 16 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm pni monitor ds_cpl cid bogomips: 5596.77 processor : 3 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 2.80GHz stepping: 10 cpu MHz : 2800.242 cache size : 16 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm pni monitor ds_cpl cid bogomips: 5596.77 at boot
Re: [Ganglia-developers] Correct counting of CPUs, Cores, Siblings (bz #84)
On Fri, Dec 22, 2006 at 08:05:02AM -0800, Martin Knoblauch wrote: Hi Folks, in order to fix bz#84 for Linux. http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=84 I think that the fix for this bug should actually include adding 2 more metrics, as the problem as stated isn't really that ganglia isn't reporting the right count of CPUs, but that there is no way to know if it is virtual or real CPUs for inventory and in some sort also scheduling reasons. This way cpu_num could be kept as the number of available CPUs, as is implicitly described to do in the current documentation for this metric and will have cpu_cores and cpu_sockets as the number of available cores or available sockets. of course for HPC, the number of effective CPUs is a function of all those 3 and the type of code that is being run, so we should leave up to the end users to figure that out while giving them all the information they need for that. the advantages of doing it this way, are that the code is greatly simplified, all possible use cases are covered and the metric is kept backward compatible. comments, anyone? Carlo
Re: [Ganglia-developers] Correct counting of CPUs, Cores, Siblings (bz #84)
On Friday 22 December 2006 11:05, Martin Knoblauch wrote: Hi Folks, in order to fix bz#84 for Linux, I would like to collect some data from different system configurations. Could you please create the file cpu.grep and execute the cat/grep chain below. Please report the results together with uname -a output which distro you are running. # more cpu.grep processor vendor model name physical id siblings core id cpu cores # cat /proc/cpuinfo | grep -f cpu.grep Here's the data from my Fedora Core 6 workstation in the office, since its fairly interesting for this specific topic. Its a dual-socket, dual-core Xeon system with hyperthreading turned on, so two sockets, four cores, eight logical cpus... Linux xavier.boston.redhat.com 2.6.18-1.2849.fc6 #1 SMP Fri Nov 10 12:34:46 EST 2006 x86_64 x86_64 x86_64 GNU/Linux processor : 0 vendor_id : GenuineIntel model name : Intel(R) Xeon(TM) CPU 3.00GHz physical id : 0 siblings: 4 core id : 0 cpu cores : 2 processor : 1 vendor_id : GenuineIntel model name : Intel(R) Xeon(TM) CPU 3.00GHz physical id : 1 siblings: 4 core id : 0 cpu cores : 2 processor : 2 vendor_id : GenuineIntel model name : Intel(R) Xeon(TM) CPU 3.00GHz physical id : 0 siblings: 4 core id : 1 cpu cores : 2 processor : 3 vendor_id : GenuineIntel model name : Intel(R) Xeon(TM) CPU 3.00GHz physical id : 1 siblings: 4 core id : 1 cpu cores : 2 processor : 4 vendor_id : GenuineIntel model name : Intel(R) Xeon(TM) CPU 3.00GHz physical id : 0 siblings: 4 core id : 0 cpu cores : 2 processor : 5 vendor_id : GenuineIntel model name : Intel(R) Xeon(TM) CPU 3.00GHz physical id : 1 siblings: 4 core id : 0 cpu cores : 2 processor : 6 vendor_id : GenuineIntel model name : Intel(R) Xeon(TM) CPU 3.00GHz physical id : 0 siblings: 4 core id : 1 cpu cores : 2 processor : 7 vendor_id : GenuineIntel model name : Intel(R) Xeon(TM) CPU 3.00GHz physical id : 1 siblings: 4 core id : 1 cpu cores : 2 -- Jarod Wilson [EMAIL PROTECTED] pgpnbKNbfBGo8.pgp Description: PGP signature