[OMPI users] Checksuming in openmpi 1.4.1

2010-08-31 Thread Gilbert Grosdidier
I set to activate it ? - Is there a time penalty for using it, please ? Thanks in advance for any help. -- Regards, Gilbert. -- *-* Gilbert Grosdidier gilbert.grosdid...@in2p3.fr LAL / IN2P3 / CNRS

[OMPI users] Trouble with Memlock when using OpenIB on an SGI ICE Cluster (fwd)

2010-11-20 Thread Gilbert Grosdidier
Bonjour, I am afraid I got a weird issue when running an OpenMPI job using OpenIB on an SGI ICE cluster with 4096 cores (or larger), and the FAQ does not help. The OMPI version is 1.4.1, and it is running just fine with a smaller number of cores (up to 512). The error message is the

Re: [OMPI users] mpool_sm_max_size disappeared ?

2010-11-29 Thread Gilbert Grosdidier
Bonjour, I found this parameter mpool_sm_max_size in this post: http://www.open-mpi.org/community/lists/devel/2008/11/4883.php But I was unable to spot it back into the 'ompi_info -all' output for v 1.4.3. Is it still existing ? If not, which other one is replacing it, please ? Also, is

[OMPI users] Trouble with IPM & OpenMPI on SGI Altix

2010-12-08 Thread Gilbert Grosdidier
Bonjour, I have trouble when trying to compile& run IPM on an SGI Altix cluster. The issue is: this cluster is providing a default SGI MPT implementation of MPI, but I want to use a private installation of OpenMPI 1.4.3 instead. 1) When I compile IPM as recommended, everything works fine,

[OMPI users] Use of -mca pml csum

2010-12-14 Thread Gilbert Grosdidier
Bonjour, Since I'm very suspicious about the condition of the IB network on my cluster, I'm trying to use the csum pml feature of OMPI (1.4.3). But I have a question: what happens if the Checksum is different on both ends ? Is there a warning printed, a flag set by the MPI_(I)recv or

Re: [OMPI users] jobs with more that 2, 500 processes will not even start

2010-12-14 Thread Gilbert Grosdidier
___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Cordialement, Gilbert. -- *-* Gilbert Grosdidier gilbert.grosdid...@in2p3.fr LAL / IN2P3 / CNRS Phone

[OMPI users] Issue with : btl_openib.c (OMPI 1.4.3)

2010-12-15 Thread Gilbert Grosdidier
Bonjour, Running with OpenMPI 1.4.3 on an SGI Altix cluster with 2048 cores, I got this error message on all cores, right at startup : btl_openib.c:211:adjust_cq] cannot resize completion queue, error: 12 What could be the culprit please ? Is there a workaround ? What parameter is to be

[OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix

2010-12-15 Thread Gilbert Grosdidier
Bonjour, Running with OpenMPI 1.4.3 on an SGI Altix cluster with 4096 cores, I got this error message, right at startup : mca_oob_tcp_peer_recv_connect_ack: received unexpected process identifier [[13816,0],209] and the whole job is going to spin for an undefined period, without

Re: [OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix

2010-12-15 Thread Gilbert Grosdidier
18992 is indeed the master one on r36i3n15. Thanks, Best,G. On Dec 15, 2010, at 1:05 AM, Gilbert Grosdidier wrote: Bonjour, Running with OpenMPI 1.4.3 on an SGI Altix cluster with 4096 cores, I got this error message, right at startup : mca_oob_tcp_peer_recv_connect_ack: received

Re: [OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix

2010-12-15 Thread Gilbert Grosdidier
Bonsoir Ralph, Le 15/12/2010 18:45, Ralph Castain a écrit : It looks like all the messages are flowing within a single job (all three processes mentioned in the error have the same identifier). Only possibility I can think of is that somehow you are reusing ports - is it possible your system

Re: [OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix

2010-12-15 Thread Gilbert Grosdidier
of nodes (1k nodes, ie 8k cores) that I could ask him (her) about the right setup ? Thanks, Best,G. Le 15/12/2010 21:03, Ralph Castain a écrit : On Dec 15, 2010, at 12:30 PM, Gilbert Grosdidier wrote: Bonsoir Ralph, Le 15/12/2010 18:45, Ralph Castain a écrit : It looks like all

Re: [OMPI users] Issue with : mca_oob_tcp_peer_recv_connect_ack on SGI Altix

2010-12-16 Thread Gilbert Grosdidier
Bonjour Jeff, Le 16/12/2010 01:40, Jeff Squyres a écrit : On Dec 15, 2010, at 3:24 PM, Ralph Castain wrote: I am not using the TCP BTL, only OPENIB one. Does this change the number of sockets in use per node, please ? I believe the openib btl opens sockets for connection purposes, so the

Re: [OMPI users] Issue with : btl_openib.c (OMPI 1.4.3)

2010-12-17 Thread Gilbert Grosdidier
by the configure step. Thanks,Best,G. Le 15 déc. 10 à 08:59, Gilbert Grosdidier a écrit : Bonjour, Running with OpenMPI 1.4.3 on an SGI Altix cluster with 2048 cores, I got this error message on all cores, right at startup : btl_openib.c:211:adjust_cq] cannot resize completion queue

Re: [OMPI users] Issue with : btl_openib.c (OMPI 1.4.3)

2010-12-17 Thread Gilbert Grosdidier
Bonjour John, First, Thanks for your feedback. Le 17 déc. 10 à 16:13, John Hearns a écrit : On 17 December 2010 14:45, Gilbert Grosdidier <gilbert.grosdid...@cern.ch> wrote: Bonjour, About this issue, for which I got NO feedback ;-) Gilbert, as you have an SGI cluster, have you

Re: [OMPI users] Issue with : btl_openib.c (OMPI 1.4.3)

2010-12-17 Thread Gilbert Grosdidier
John, Thanks, more info below. Le 17/12/2010 17:32, John Hearns a écrit : On 17 December 2010 15:47, Gilbert Grosdidier <gilbert.grosdid...@cern.ch> wrote: gg= I don't know, and firmware_revs does not seem to be available. Only thing I got on a worker node was with lspci : If y

[OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2010-12-20 Thread Gilbert Grosdidier
Bonjour, I am now at a loss with my running of OpenMPI (namely 1.4.3) on a SGI Altix cluster with 2048 or 4096 cores, running over Infiniband. After fixing several rather obvious failures with Ralph, Jeff and John help, I am now facing the bottom of this story since : - there are no more

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2010-12-20 Thread Gilbert Grosdidier
/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Cordialement, Gilbert. -- *---------* Gilbert Grosdidier gilbert.grosdid...@in2p3.fr

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores: very poor performance

2010-12-21 Thread Gilbert Grosdidier
with the --byslot --bynode options to see how this affect the performance of your application. For the hardcore cases we provide a rankfile feature. More info at: http://www.open-mpi.org/faq/?category=tuning#using-paffinity Enjoy, george. On Dec 20, 2010, at 15:45 , Gilbert Grosdidier wrote: Yes

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores: very poor performance

2010-12-22 Thread Gilbert Grosdidier
forget that MPT has some optimizations OpenMPI may not have, as "overriding" free(). This way, MPT can have a huge performance boost if you're allocating and freeing memory, and the same happens if you communicate often. Matthieu 2010/12/21 Gilbert Grosdidier<gilbert.grosdid...@cern.ch

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores: very poor performance

2010-12-22 Thread Gilbert Grosdidier
or "lots of long messages", etc. It sounds like there is some repeated set of MPI exchanges, so maybe that set can be extracted and run without the complexities of the application. Anyhow, some profiling might help guide one to the problem. Gilbert Grosdidier wrote: Ther

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2010-12-22 Thread Gilbert Grosdidier
Hi David, Yes, I set mpi_affinity_alone to 1. Is that right and sufficient, please ? Thanks for your help, Best, G. Le 22/12/2010 20:18, David Singleton a écrit : Is the same level of processes and memory affinity or binding being used? On 12/21/2010 07:45 AM, Gilbert Grosdidier

Re: [OMPI users] Trouble with Memlock when using OpenIB on an SGI ICE Cluster

2010-12-31 Thread Gilbert Grosdidier
all-to-all communication is not required on a big cluster. Could someone comment on this ? More info on request. Thanks, Happy New Year to you all, G. Le 29/11/2010 16:58, Gilbert Grosdidier a écrit : Bonjour John, Thanks for your feedback, but my investigations so far did

Re: [OMPI users] Granular locks?

2011-01-05 Thread Gilbert Grosdidier
Hi Gijsbert, Thank you for this proposal, I think it could be useful for our LQCD application, at least for further evaluations. How could I get to the code, please ? Thanks in advance for your help, Best, G. Le 03/01/2011 22:36, Gijsbert Wiesenekker a écrit : On Oct 2, 2010, at

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2011-01-06 Thread Gilbert Grosdidier
at 2:25 PM, Gilbert Grosdidier wrote: Hi David, Yes, I set mpi_affinity_alone to 1. Is that right and sufficient, please ? Thanks for your help, Best, G. Le 22/12/2010 20:18, David Singleton a écrit : Is the same level of processes and memory affinity or binding being used? On 12/21/201

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2011-01-06 Thread Gilbert Grosdidier
P#7) PU L#15 (P#15) Tests with --bind-to-core are under way ... What is your conclusion, please ? Thanks, G. Le 06/01/2011 23:16, Jeff Squyres a écrit : On Jan 6, 2011, at 5:07 PM, Gilbert Grosdidier wrote: Yes Jeff, I'm pretty sure indeed that hyperthreading is enabled, since 16 C

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2011-01-07 Thread Gilbert Grosdidier
rds, Gilbert. Le 7 janv. 11 à 15:35, Jeff Squyres a écrit : On Jan 6, 2011, at 11:23 PM, Gilbert Grosdidier wrote: lstopo Machine (35GB) NUMANode L#0 (P#0 18GB) + Socket L#0 + L3 L#0 (8192KB) L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0 PU L#0 (P#0) PU L#1 (P#8) L2 L#1 (256KB) + L1 L#1 (3

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2011-01-07 Thread Gilbert Grosdidier
p=8? On Jan 7, 2011, at 9:49 AM, Gilbert Grosdidier wrote: Hi Jeff, Thanks for taking care of this. Here is what I got on a worker node: mpirun --mca mpi_paffinity_alone 1 /opt/software/SGI/hwloc/ 1.1rc6r3028/bin/hwloc-bind --get 0x0001 Is this what is expected, please ? Or should I try

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2011-01-07 Thread Gilbert Grosdidier
en-mpi.org/mailman/listinfo.cgi/users -- *-----* Gilbert Grosdidier gilbert.grosdid...@in2p3.fr LAL / IN2P3 / CNRS Phone : +33 1 6446 8909 Faculté des Sciences, Bat. 200 Fax : +33 1 6446 8546

Re: [OMPI users] Issue with : btl_openib.c (OMPI 1.4.3)

2011-01-07 Thread Gilbert Grosdidier
:30 PM, Gilbert Grosdidier wrote: John, Thanks, more info below. Le 17/12/2010 17:32, John Hearns a écrit : On 17 December 2010 15:47, Gilbert Grosdidier <gilbert.grosdid...@cern.ch> wrote: gg= I don't know, and firmware_revs does not seem to be available. Only thing I got on a worke

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2011-01-07 Thread Gilbert Grosdidier
with np=8? On Jan 7, 2011, at 9:49 AM, Gilbert Grosdidier wrote: Hi Jeff, Thanks for taking care of this. Here is what I got on a worker node: mpirun --mca mpi_paffinity_alone 1 /opt/software/SGI/hwloc/ 1.1rc6r3028/bin/hwloc-bind --get 0x0001 Is this what is expected, please ? Or should I

Re: [OMPI users] Problems on large clusters

2011-06-21 Thread Gilbert Grosdidier
man/listinfo.cgi/users -- *---------* Gilbert Grosdidier gilbert.grosdid...@in2p3.fr LAL / IN2P3 / CNRS Phone : +33 1 6446 8909 Faculté des Sciences, Bat. 200 Fax : +33 1 6446 8546 B.P. 34, F-91898 Orsay Cedex (FRANCE) *-*

Re: [OMPI users] Problems on large clusters

2011-06-22 Thread Gilbert Grosdidier
job. I use 255 nodes with one MPI task on each node and use 8-way OpenMP. I don't need -np and -machinefile, because mpiexec picks up this information from PBS. Thorsten On Tuesday, June 21, 2011, Gilbert Grosdidier wrote: Bonjour Thorsten, Could you please be a little bit more specific

[OMPI users] Working with a CellBlade cluster

2008-10-19 Thread Gilbert Grosdidier
Working with a CellBlade cluster (QS22), the requirement is to have one instance of the executable running on each socket of the blade (there are 2 sockets). The application is of the 'domain decomposition' type, and each instance is required to often send/receive data with both the remote blades

Re: [OMPI users] Working with a CellBlade cluster

2008-10-29 Thread Gilbert Grosdidier
g processes evenly between sockets by it > self. > > There still no formal FAQ due to a multiple reasons but you can read how to > use it in the attached scratch ( there were few name changings of the > params, so check with ompi_info ) > > shared memory is used between pr

Re: [OMPI users] problem running Open MPI on Cells

2008-10-31 Thread Gilbert Grosdidier
h...@ll.mit.edu > MIT Lincoln Laboratory > 244 Wood St., Lexington, MA 02420 > Tel: 781-981-0940, Fax: 781-981-5255 > > > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >

Re: [OMPI users] Working with a CellBlade cluster

2008-10-31 Thread Gilbert Grosdidier
> > > One way to check if the message goes via IB or SM maybe to check the > > counters in /sys/class/infiniband. > > > > Regards, > > Mi > > Gilbert Grosdidier <gro...@mail.cern.ch> > > > > > > Gilbert Grosdidier <gro...@mail.cern

[OMPI users] mca btl_openib_flags default value

2008-11-04 Thread Gilbert Grosdidier
ted or not ? I could understand any value between 1 & 7, but what does mean 54, please ? Does it behave like 6, if removal of the unexpected bits ? Thanks,Gilbert -- *---------* Gilbert Grosdidier

Re: [OMPI users] Problem with feupdateenv

2008-12-10 Thread Gilbert Grosdidier
case, during non-mpi c code compilation or execution. > >> > >> # icc sample.c -o sample > >> # ./sample > >> > >> Compiler is working > >> # > >> > >> What might be the reason for this & how it can be resolved? > >> > >> Thanks, > >> Sangamesh > >> ___ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- *-* Gilbert Grosdidier gilbert.grosdid...@in2p3.fr LAL / IN2P3 / CNRS Phone : +33 1 6446 8909 Faculté des Sciences, Bat. 200 Fax : +33 1 6446 8546 B.P. 34, F-91898 Orsay Cedex (FRANCE) -