A while ago Tiago Marques had provided some benchmarking info in a
thread ( http://www.beowulf.org/archive/2009-May/025739.html ) and
some recent tests that I've been doing made me interested in this
snippet again:
One of the codes, VASP, is very bandwidth limited and loves to run in a
number of
I'm sorry for my mistake:
the problem is on Nehalem Xeon under SuSE -11.1, but w/kernel
2.6.27.7-9 (w/Supermicro X8DT mobo). For Opteron 2350 w/SuSE 10.3 (w/
more old 2.6.22.5-31 -I erroneously inserted this string in my
previous message) numactl works OK (w/Tyan mobo).
NUMA is enabled in
this is on the machine which reports 16 cores, right? ?I'm guessing
that the kernel is compiled without numa and/or ht, so enumerates virtual
cpus first. ?that would mean that when otherwise idle, a 2-core
proc will get virtual cores within the same physical core. ?and that your 8c
test is merely
On Mon, Aug 10, 2009 at 08:33:27AM -0400, Mark Hahn wrote:
Is there a way of finding out within Linux if Hyperthreading is on or
not?
in /proc/cpuinfo, I believe it's a simple as siblings cpu cores.
that is, I'm guessing one of your nehalem's shows as having 8 siblings
and 4 cpu cores.
On Mon, 10 Aug 2009 at 11:43am, Rahul Nabar wrote
On Mon, Aug 10, 2009 at 7:41 AM, Mark Hahnh...@mcmaster.ca wrote:
(a) I am seeing strange scaling behaviours with Nehlem cores. eg A
specific DFT (Density Functional Theory) code we use is maxing out
performance at 2, 4 cpus instead of 8. i.e.
Joshua Baker-LePain wrote:
On Mon, 10 Aug 2009 at 11:43am, Rahul Nabar wrote
On Mon, Aug 10, 2009 at 7:41 AM, Mark Hahnh...@mcmaster.ca wrote:
(a) I am seeing strange scaling behaviours with Nehlem cores. eg A
specific DFT (Density Functional Theory) code we use is maxing out
performance at
On Mon, Aug 10, 2009 at 2:09 PM, Joshua Baker-LePainjl...@duke.edu wrote:
Well, as there are only 8 real cores, running a computationally intensive
process across 16 should *definitely* do worse than across 8. However, it's
not so surprising that you're seeing peak performance with 2-4 threads.
On Mon, 10 Aug 2009 at 3:02pm, Rahul Nabar wrote
On Mon, Aug 10, 2009 at 2:09 PM, Joshua Baker-LePainjl...@duke.edu wrote:
Well, as there are only 8 real cores, running a computationally intensive
process across 16 should *definitely* do worse than across 8. However, it's
not so surprising
Joshua Baker-LePain wrote:
Well, as there are only 8 real cores, running a computationally
intensive process across 16 should *definitely* do worse than across 8.
I've seen many cases where that isn't true. The P4 rarely justified turning
on HT because throughput would often be lower. With
Joshua Baker-LePain wrote:
On Mon, 10 Aug 2009 at 11:43am, Rahul Nabar wrote
On Mon, Aug 10, 2009 at 7:41 AM, Mark Hahnh...@mcmaster.ca wrote:
(a) I am seeing strange scaling behaviours with Nehlem cores. eg A
specific DFT (Density Functional Theory) code we use is maxing out
performance at
Well, as there are only 8 real cores, running a computationally
intensive process across 16 should *definitely* do worse than across 8.
Not typically.
At the SPEC website there are quite a few SPEC MPI2007 (which is an average
across 13 HPC applications) results on Nehalem.
Summary:
IBM, SGI
On Mon, Aug 10, 2009 at 12:48 PM, Bruno Coutinhocouti...@dcc.ufmg.br wrote:
This is often caused by cache competition or memory bandwidth saturation.
If it was cache competition, rising from 4 to 6 threads would make it worse.
As the code became faster with DDR3-1600 and much slower with Xeon
12 matches
Mail list logo