[Beowulf] bizarre scaling behavior on a Nehalem

2009-08-10 Thread Rahul Nabar
A while ago Tiago Marques had provided some benchmarking info in a thread ( http://www.beowulf.org/archive/2009-May/025739.html ) and some recent tests that I've been doing made me interested in this snippet again: One of the codes, VASP, is very bandwidth limited and loves to run in a number of

Re: [Beowulf] numactl SuSE11.1

2009-08-10 Thread Mikhail Kuzminsky
I'm sorry for my mistake: the problem is on Nehalem Xeon under SuSE -11.1, but w/kernel 2.6.27.7-9 (w/Supermicro X8DT mobo). For Opteron 2350 w/SuSE 10.3 (w/ more old 2.6.22.5-31 -I erroneously inserted this string in my previous message) numactl works OK (w/Tyan mobo). NUMA is enabled in

Re: [Beowulf] performance tweaks and optimum memory configs for a Nehalem

2009-08-10 Thread Mark Hahn
this is on the machine which reports 16 cores, right? ?I'm guessing that the kernel is compiled without numa and/or ht, so enumerates virtual cpus first. ?that would mean that when otherwise idle, a 2-core proc will get virtual cores within the same physical core. ?and that your 8c test is merely

Re: [Beowulf] performance tweaks and optimum memory configs for a Nehalem

2009-08-10 Thread Renato Callado Borges
On Mon, Aug 10, 2009 at 08:33:27AM -0400, Mark Hahn wrote: Is there a way of finding out within Linux if Hyperthreading is on or not? in /proc/cpuinfo, I believe it's a simple as siblings cpu cores. that is, I'm guessing one of your nehalem's shows as having 8 siblings and 4 cpu cores.

Re: [Beowulf] performance tweaks and optimum memory configs for a Nehalem

2009-08-10 Thread Joshua Baker-LePain
On Mon, 10 Aug 2009 at 11:43am, Rahul Nabar wrote On Mon, Aug 10, 2009 at 7:41 AM, Mark Hahnh...@mcmaster.ca wrote: (a) I am seeing strange scaling behaviours with Nehlem cores. eg A specific DFT (Density Functional Theory) code we use is maxing out performance at 2, 4 cpus instead of 8. i.e.

Re: [Beowulf] performance tweaks and optimum memory configs for a Nehalem

2009-08-10 Thread Gus Correa
Joshua Baker-LePain wrote: On Mon, 10 Aug 2009 at 11:43am, Rahul Nabar wrote On Mon, Aug 10, 2009 at 7:41 AM, Mark Hahnh...@mcmaster.ca wrote: (a) I am seeing strange scaling behaviours with Nehlem cores. eg A specific DFT (Density Functional Theory) code we use is maxing out performance at

Re: [Beowulf] performance tweaks and optimum memory configs for a Nehalem

2009-08-10 Thread Rahul Nabar
On Mon, Aug 10, 2009 at 2:09 PM, Joshua Baker-LePainjl...@duke.edu wrote: Well, as there are only 8 real cores, running a computationally intensive process across 16 should *definitely* do worse than across 8. However, it's not so surprising that you're seeing peak performance with 2-4 threads.

Re: [Beowulf] performance tweaks and optimum memory configs for a Nehalem

2009-08-10 Thread Joshua Baker-LePain
On Mon, 10 Aug 2009 at 3:02pm, Rahul Nabar wrote On Mon, Aug 10, 2009 at 2:09 PM, Joshua Baker-LePainjl...@duke.edu wrote: Well, as there are only 8 real cores, running a computationally intensive process across 16 should *definitely* do worse than across 8. However, it's not so surprising

Re: [Beowulf] performance tweaks and optimum memory configs for a Nehalem

2009-08-10 Thread Bill Broadley
Joshua Baker-LePain wrote: Well, as there are only 8 real cores, running a computationally intensive process across 16 should *definitely* do worse than across 8. I've seen many cases where that isn't true. The P4 rarely justified turning on HT because throughput would often be lower. With

Re: [Beowulf] performance tweaks and optimum memory configs for a Nehalem

2009-08-10 Thread Craig Tierney
Joshua Baker-LePain wrote: On Mon, 10 Aug 2009 at 11:43am, Rahul Nabar wrote On Mon, Aug 10, 2009 at 7:41 AM, Mark Hahnh...@mcmaster.ca wrote: (a) I am seeing strange scaling behaviours with Nehlem cores. eg A specific DFT (Density Functional Theory) code we use is maxing out performance at

RE: [Beowulf] performance tweaks and optimum memory configs for a Nehalem

2009-08-10 Thread Tom Elken
Well, as there are only 8 real cores, running a computationally intensive process across 16 should *definitely* do worse than across 8. Not typically. At the SPEC website there are quite a few SPEC MPI2007 (which is an average across 13 HPC applications) results on Nehalem. Summary: IBM, SGI

Re: [Beowulf] bizarre scaling behavior on a Nehalem

2009-08-10 Thread Rahul Nabar
On Mon, Aug 10, 2009 at 12:48 PM, Bruno Coutinhocouti...@dcc.ufmg.br wrote: This is often caused by cache competition or memory bandwidth saturation. If it was cache competition, rising from 4 to 6 threads would make it worse. As the code became faster with DDR3-1600 and much slower with Xeon