On 10/05/2010 10:23 AM, Storm Zhang wrote:
Sorry, I should say one more thing about the 500 procs test. I tried
to run two 500 procs at the same time using SGE and it runs fast and
finishes at the same time as the single run. So I think OpenMPI can
handle them separately very well.
For the
Sorry, I should say one more thing about the 500 procs test. I tried to run
two 500 procs at the same time using SGE and it runs fast and finishes at
the same time as the single run. So I think OpenMPI can handle them
separately very well.
For the bind-to-core, I tried to run mpirun --help but
Storm Zhang wrote:
Here is what I meant: the results of 500 procs in fact shows it with
272-304(<500) real cores, the program's running time is good, which is
almost five times 100 procs' time. So it can be handled very well.
Therefore I guess OpenMPI or Rocks OS does make use of
Here is what I meant: the results of 500 procs in fact shows it with
272-304(<500) real cores, the program's running time is good, which is
almost five times 100 procs' time. So it can be handled very well. Therefore
I guess OpenMPI or Rocks OS does make use of hyperthreading to do the job.
But
On Oct 4, 2010, at 1:48 PM, Storm Zhang wrote:
> Thanks a lot, Ralgh. As I said, I also tried to use SGE(also showing 1024
> available for parallel tasks) which only assign 34-38 compute nodes which
> only has 272-304 real cores for 500 procs running. The running time is
> consistent with 100
Thanks a lot, Ralgh. As I said, I also tried to use SGE(also showing 1024
available for parallel tasks) which only assign 34-38 compute nodes which
only has 272-304 real cores for 500 procs running. The running time is
consistent with 100 procs and not a lot fluctuations due to the number of
Some of what you are seeing is the natural result of context switchingsome
thoughts regarding the results:
1. You didn't bind your procs to cores when running with #procs < #cores, so
you're performance in those scenarios will also be less than max.
2. Once the number of procs exceeds the
Thanks a lot for your reply, Doug.
There is one more thing I forgot to mention. For 500 nodes test, I observe
if I use SGE, it runs only almost half of our cluster, like 35-38 nodes, not
uniformly distributed on the whole cluster but the running time is still
good. So I guess it is not a
In my experience hyperthreading can't really deliver two cores worth of
processing simultaneously for processes expecting sole use of a core. Since you
really have 512 cores I'm not surprised that you see a performance hit when
requesting > 512 compute units. We should really get input from a
We have 64 compute nodes which are dual qual-core and hyperthreaded CPUs. So
we have 1024 compute units shown in the ROCKS 5.3 system. I'm trying to
scatter an array from the master node to the compute nodes using mpiCC and
mpirun using C++.
Here is my test:
The array size is 18KB * Number of
10 matches
Mail list logo