Gilbert Grosdidier wrote:
Any other suggestion ?
Can any more information be extracted from profiling? Here is where I
think things left off:
Eugene Loh wrote:
Gilbert Grosdidier wrote:
#
[time] [calls] <%mpi> <%wall>
# MPI_Waitall
Unfortunately, I was unable to spot any striking difference in perfs
when using --bind-to-core.
Sorry. Any other suggestion ?
Regards,Gilbert.
Le 7 janv. 11 à 16:32, Jeff Squyres a écrit :
Well, bummer -- there goes my theory. According to the hwloc info
you posted earlier, this s
I'll very soon give a try to using Hyperthreading with our app,
and keep you posted about the improvements, if any.
Our current cluster is made out of 4-core dual-socket Nehalem nodes.
Cheers,Gilbert.
Le 7 janv. 11 à 16:17, Tim Prince a écrit :
On 1/7/2011 6:49 AM, Jeff Squyres wrote:
Well, bummer -- there goes my theory. According to the hwloc info you posted
earlier, this shows that OMPI is binding to the 1st hyperthread on each core;
*not* to both hyperthreads on a single core. :-\
It would still be slightly interesting to see if there's any difference when
you run with
On 1/7/2011 6:49 AM, Jeff Squyres wrote:
My understanding is that hyperthreading can only be activated/deactivated at
boot time -- once the core resources are allocated to hyperthreads, they can't
be changed while running.
Whether disabling the hyperthreads or simply telling Linux not to sche
Yes, here it is :
> mpirun -np 8 --mca mpi_paffinity_alone 1 /opt/software/SGI/hwloc/
1.1rc6r3028/bin/hwloc-bind --get
0x0001
0x0002
0x0004
0x0008
0x0010
0x0020
0x0040
0x0080
Gilbert.
Le 7 janv. 11 à 15:50, Jeff Squyres a écrit :
Can you run with np=8?
On J
Can you run with np=8?
On Jan 7, 2011, at 9:49 AM, Gilbert Grosdidier wrote:
> Hi Jeff,
>
> Thanks for taking care of this.
>
> Here is what I got on a worker node:
>
> > mpirun --mca mpi_paffinity_alone 1
> > /opt/software/SGI/hwloc/1.1rc6r3028/bin/hwloc-bind --get
> 0x0001
>
> Is thi
On Jan 7, 2011, at 5:27 AM, John Hearns wrote:
> Actually, the topic of hyperthreading is interesting, and we should
> discuss it please.
> Hyperthreading is supposedly implemented better and 'properly' on
> Nehalem - I would be interested to see some genuine
> performance measurements with hypert
Hi Jeff,
Thanks for taking care of this.
Here is what I got on a worker node:
> mpirun --mca mpi_paffinity_alone 1 /opt/software/SGI/hwloc/
1.1rc6r3028/bin/hwloc-bind --get
0x0001
Is this what is expected, please ? Or should I try yet another
command ?
Thanks, Regards, Gilbert
On Jan 6, 2011, at 11:23 PM, Gilbert Grosdidier wrote:
> > lstopo
> Machine (35GB)
> NUMANode L#0 (P#0 18GB) + Socket L#0 + L3 L#0 (8192KB)
>L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0
> PU L#0 (P#0)
> PU L#1 (P#8)
>L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1
> PU L#2 (P#1)
>
On 6 January 2011 21:10, Gilbert Grosdidier wrote:
> Hi Jeff,
>
> Where's located lstopo command on SuseLinux, please ?
> And/or hwloc-bind, which seems related to it ?
I was able to get hwloc to install quite easily on SuSE -
download/configure/make
Configure it to install to /usr/local/bin
A
Hi Jeff,
Here is the output of lstopo on one of the workers (thanks
Jean-Christophe) :
> lstopo
Machine (35GB)
NUMANode L#0 (P#0 18GB) + Socket L#0 + L3 L#0 (8192KB)
L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#8)
L2 L#1 (256KB) + L1 L#1 (32KB) + Core
On Jan 6, 2011, at 5:07 PM, Gilbert Grosdidier wrote:
> Yes Jeff, I'm pretty sure indeed that hyperthreading is enabled, since 16
> CPUs are visible in the /proc/cpuinfo pseudo-file, while it's a 8 core
> Nehalem node.
>
> However, I always carefully checked that only 8 processes are running o
Yes Jeff, I'm pretty sure indeed that hyperthreading is enabled, since
16 CPUs
are visible in the /proc/cpuinfo pseudo-file, while it's a 8 core
Nehalem node.
However, I always carefully checked that only 8 processes are running
on each node.
Could it be that they are assigned to 8 hyperthre
On Jan 6, 2011, at 4:10 PM, Gilbert Grosdidier wrote:
> Where's located lstopo command on SuseLinux, please ?
'fraid I don't know anything about Suse... :-(
It may be named hwloc-ls...?
> And/or hwloc-bind, which seems related to it ?
hwloc-bind is definitely related, but it's a different uti
Hi Jeff,
Where's located lstopo command on SuseLinux, please ?
And/or hwloc-bind, which seems related to it ?
Thanks, G.
Le 06/01/2011 21:21, Jeff Squyres a écrit :
(now that we're back from vacation)
Actually, this could be an issue. Is hyperthreading enabled on your machine?
Can yo
(now that we're back from vacation)
Actually, this could be an issue. Is hyperthreading enabled on your machine?
Can you send the text output from running hwloc's "lstopo" command on your
compute nodes?
I ask because if hyperthreading is enabled, OMPI might be assigning one process
per *hyert
Hi David,
Yes, I set mpi_affinity_alone to 1. Is that right and sufficient, please ?
Thanks for your help, Best, G.
Le 22/12/2010 20:18, David Singleton a écrit :
Is the same level of processes and memory affinity or binding being used?
On 12/21/2010 07:45 AM, Gilbert Grosdidier wrot
Is the same level of processes and memory affinity or binding being used?
On 12/21/2010 07:45 AM, Gilbert Grosdidier wrote:
Yes, there is definitely only 1 process per core with both MPI implementations.
Thanks, G.
Le 20/12/2010 20:39, George Bosilca a écrit :
Are your processes places the
Gilbert Grosdidier wrote:
Bonsoir Eugene,
Bon matin chez moi.
Here
follows some output for a 1024 core run.
Assuming this corresponds meaningfully with your original e-mail, 1024
cores means performance of 700 vs 900. So, that looks roughly
consistent with the 28% MPI time you show here
Bonsoir Eugene,
First thanks for trying to help me.
I already gave a try to some profiling tool, namely IPM, which is rather
simple to use. Here follows some output for a 1024 core run.
Unfortunately, I'm yet unable to have the equivalent MPT chart.
#IPMv0.983#
Can you isolate a bit more where the time is being spent? The
performance effect you're describing appears to be drastic. Have you
profiled the code? Some choices of tools can be found in the FAQ
http://www.open-mpi.org/faq/?category=perftools The results may be
"uninteresting" (all time sp
There are indeed a high rate of communications. But the buffer
size is always the same for a given pair of processes, and I thought
that mpi_leave_pinned should avoid freeing the memory in this case.
Am I wrong ?
Thanks, Best, G.
Le 21/12/2010 18:52, Matthieu Brucher a écrit :
Don't forg
Don't forget that MPT has some optimizations OpenMPI may not have, as
"overriding" free(). This way, MPT can have a huge performance boost
if you're allocating and freeing memory, and the same happens if you
communicate often.
Matthieu
2010/12/21 Gilbert Grosdidier :
> Hi George,
> Thanks for yo
Hi George,
Thanks for your help. The bottom line is that the processes are
neatly placed on the nodes/cores,
as far as I can tell from the map :
[...]
Process OMPI jobid: [33285,1] Process rank: 4
Process OMPI jobid: [33285,1] Process rank: 5
Process OMPI jobid: [3328
That's a first step. My question was more related to the process overlay on the
cores. If the MPI implementation place one process per node, then rank k and
rank k+1 will always be on separate node, and the communications will have to
go over IB. In the opposite if the MPI implementation places
Yes, there is definitely only 1 process per core with both MPI
implementations.
Thanks, G.
Le 20/12/2010 20:39, George Bosilca a écrit :
Are your processes places the same way with the two MPI implementations?
Per-node vs. per-core ?
george.
On Dec 20, 2010, at 11:14 , Gilbert Grosdi
Are your processes places the same way with the two MPI implementations?
Per-node vs. per-core ?
george.
On Dec 20, 2010, at 11:14 , Gilbert Grosdidier wrote:
> Bonjour,
>
> I am now at a loss with my running of OpenMPI (namely 1.4.3)
> on a SGI Altix cluster with 2048 or 4096 cores, runnin
Bonjour,
I am now at a loss with my running of OpenMPI (namely 1.4.3)
on a SGI Altix cluster with 2048 or 4096 cores, running over Infiniband.
After fixing several rather obvious failures with Ralph, Jeff and John
help,
I am now facing the bottom of this story since :
- there are no more obv
29 matches
Mail list logo