Re: [Ganglia-developers] Correct counting of CPUs, Cores, Siblings (bz #84)

2007-01-07 Thread Carlo Marcelo Arenas Belon
On Fri, Jan 05, 2007 at 05:23:47PM -, [EMAIL PROTECTED] wrote:
 Also remember about the cygwin agent build, which also processes
 from cygwin's /proc.

the cpu_num metric was hardcoded to return 1 until 3.0.5, so that one will
change regardless, but this time it will show the right number of CPUs
instead (using a native windows call).

the percentage values which use /proc/stat are independent of the number of
CPUs and therefore will be unaffected, and well the cpu load metrics are
broken regardless anyway.

Carlo



Re: [Ganglia-developers] Correct counting of CPUs, Cores, Siblings (bz #84)

2007-01-02 Thread Martin Knoblauch

--- Carlo Marcelo Arenas Belon [EMAIL PROTECTED] wrote:

 On Fri, Dec 22, 2006 at 08:05:02AM -0800, Martin Knoblauch wrote:
  Hi Folks,
  
   in order to fix bz#84 for Linux.
 
   http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=84
 
 I think that the fix for this bug should actually include adding 2
 more
 metrics, as the problem as stated isn't really that ganglia isn't
 reporting
 the right count of CPUs, but that there is no way to know if it is
 virtual or
 real CPUs for inventory and in some sort also scheduling reasons.
 
 This way cpu_num could be kept as the number of available CPUs, as
 is
 implicitly described to do in the current documentation for this
 metric and 
 will have cpu_cores and cpu_sockets as the number of available
 cores or 
 available sockets.
 
 of course for HPC, the number of effective CPUs is a function of
 all those 3
 and the type of code that is being run, so we should leave up to the
 end users
 to figure that out while giving them all the information they need
 for that.
 
 the advantages of doing it this way, are that the code is greatly
 simplified,
 all possible use cases are covered and the metric is kept backward
 compatible.
 
 comments, anyone?
 
 Carlo
Carlo,

 modulo the naming of the new metrics, I completely agree with you. In
order to make an educated guess, we need all three components. And we
should not forget that more and more clusters in use are running
non-HPC workloads, where the virtual CPUs may actually be of use.

 One thing we should at least keep in mind is the fact the number of
CPUs may no longer be a constant - CPU hotplugging is available on
Linux and some of the proprietary Unixes. Same for memeory. And the
cpu-frequency has been variable for years.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de



Re: [Ganglia-developers] Correct counting of CPUs, Cores, Siblings (bz #84)

2007-01-01 Thread Carlo Marcelo Arenas Belon
On Wed, Dec 27, 2006 at 03:12:51AM -0800, Martin Knoblauch wrote:
  What I now need is the output from 2.4 based configs. Only multi-core
 and/or HT-enabled systems actually.

AFAIK in vanilla kernel 2.4 there is no way to tell between virtual and real 
CPUs from /proc/cpuinfo as can be seen by the following examples :

HW: Dual CPU 2.66GHz, HT enabled
SW: MDK 9.1 with a recompiled 2.4.21 kernel with ACPI_HT_ONLY=y

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 2
model name  : Intel(R) Xeon(TM) CPU 2.66GHz
stepping: 9
cpu MHz : 2665.947
cache size  : 512 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 2
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips: 5321.52

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 15
model   : 2
model name  : Intel(R) Xeon(TM) CPU 2.66GHz
stepping: 9
cpu MHz : 2665.947
cache size  : 512 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 2
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips: 5321.52

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 15
model   : 2
model name  : Intel(R) Xeon(TM) CPU 2.66GHz
stepping: 9
cpu MHz : 2665.947
cache size  : 512 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 2
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips: 5321.52

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 15
model   : 2
model name  : Intel(R) Xeon(TM) CPU 2.66GHz
stepping: 9
cpu MHz : 2665.947
cache size  : 512 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 2
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips: 5321.52

HW: Dual CPU 2.8GHz, HT enabled
SW: MDK 9.1 with a custom compiled 2.4.32 kernel and booted with acpi=ht

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 4
model name  : Intel(R) Xeon(TM) CPU 2.80GHz
stepping: 10
cpu MHz : 2800.242
cache size  : 16 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 5
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm pni monitor
ds_cpl cid
bogomips: 5583.66

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 15
model   : 4
model name  : Intel(R) Xeon(TM) CPU 2.80GHz
stepping: 10
cpu MHz : 2800.242
cache size  : 16 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 5
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm pni monitor
ds_cpl cid
bogomips: 5596.77

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 15
model   : 4
model name  : Intel(R) Xeon(TM) CPU 2.80GHz
stepping: 10
cpu MHz : 2800.242
cache size  : 16 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 5
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm pni monitor
ds_cpl cid
bogomips: 5596.77

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 15
model   : 4
model name  : Intel(R) Xeon(TM) CPU 2.80GHz
stepping: 10
cpu MHz : 2800.242
cache size  : 16 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 5
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm pni monitor
ds_cpl cid
bogomips: 5596.77

at boot 

Re: [Ganglia-developers] Correct counting of CPUs, Cores, Siblings (bz #84)

2007-01-01 Thread Carlo Marcelo Arenas Belon
On Fri, Dec 22, 2006 at 08:05:02AM -0800, Martin Knoblauch wrote:
 Hi Folks,
 
  in order to fix bz#84 for Linux.

  http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=84

I think that the fix for this bug should actually include adding 2 more
metrics, as the problem as stated isn't really that ganglia isn't reporting
the right count of CPUs, but that there is no way to know if it is virtual or
real CPUs for inventory and in some sort also scheduling reasons.

This way cpu_num could be kept as the number of available CPUs, as is
implicitly described to do in the current documentation for this metric and 
will have cpu_cores and cpu_sockets as the number of available cores or 
available sockets.

of course for HPC, the number of effective CPUs is a function of all those 3
and the type of code that is being run, so we should leave up to the end users
to figure that out while giving them all the information they need for that.

the advantages of doing it this way, are that the code is greatly simplified,
all possible use cases are covered and the metric is kept backward compatible.

comments, anyone?

Carlo



Re: [Ganglia-developers] Correct counting of CPUs, Cores, Siblings (bz #84)

2006-12-22 Thread Jarod Wilson
On Friday 22 December 2006 11:05, Martin Knoblauch wrote:
 Hi Folks,

  in order to fix bz#84 for Linux, I would like to collect some data
 from different system configurations. Could you please create the file
 cpu.grep and execute the cat/grep chain below.

  Please report the results together with uname -a output which distro
 you are running.

 # more cpu.grep
 processor
 vendor
 model name
 physical id
 siblings
 core id
 cpu cores
 # cat /proc/cpuinfo  | grep -f cpu.grep

Here's the data from my Fedora Core 6 workstation in the office, since its 
fairly interesting for this specific topic. Its a dual-socket, dual-core Xeon 
system with hyperthreading turned on, so two sockets, four cores, eight 
logical cpus...

Linux xavier.boston.redhat.com 2.6.18-1.2849.fc6 #1 SMP Fri Nov 10 12:34:46 
EST 2006 x86_64 x86_64 x86_64 GNU/Linux

processor   : 0
vendor_id   : GenuineIntel
model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
physical id : 0
siblings: 4
core id : 0
cpu cores   : 2
processor   : 1
vendor_id   : GenuineIntel
model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
physical id : 1
siblings: 4
core id : 0
cpu cores   : 2
processor   : 2
vendor_id   : GenuineIntel
model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
physical id : 0
siblings: 4
core id : 1
cpu cores   : 2
processor   : 3
vendor_id   : GenuineIntel
model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
physical id : 1
siblings: 4
core id : 1
cpu cores   : 2
processor   : 4
vendor_id   : GenuineIntel
model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
physical id : 0
siblings: 4
core id : 0
cpu cores   : 2
processor   : 5
vendor_id   : GenuineIntel
model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
physical id : 1
siblings: 4
core id : 0
cpu cores   : 2
processor   : 6
vendor_id   : GenuineIntel
model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
physical id : 0
siblings: 4
core id : 1
cpu cores   : 2
processor   : 7
vendor_id   : GenuineIntel
model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
physical id : 1
siblings: 4
core id : 1
cpu cores   : 2


-- 
Jarod Wilson
[EMAIL PROTECTED]


pgpnbKNbfBGo8.pgp
Description: PGP signature