date:20110107

Re: [OMPI users] change between openmpi 1.4.1 and 1.5.1 about MPI2 publish name

2011-01-07 Thread Bernard Secher - SFME/LGLS

Jeff, Only the processes of the program where process 0 successed to publish name, have srv=1 and then call MPI_Comm_accept. The processes of the program where process 0 failed to publish name, have srv=0 and then call MPI_Comm_connect. That's worked like this with openmpi 1.4.1. Is it diffe

Re: [OMPI users] change between openmpi 1.4.1 and 1.5.1 about MPI2 publish name

2011-01-07 Thread Bernard Secher - SFME/LGLS

Jeff, The dead lock is not in MPI_Comm_accept and MPI_Comm_connect, but before in MPI_Publish_name and MPI_Lookup_name. So the broadcast of srv is not involved in the dead lock. Best Bernard Bernard Secher - SFME/LGLS a écrit : Jeff, Only the processes of the program where process 0 success

Re: [OMPI users] change between openmpi 1.4.1 and 1.5.1 about MPI2 publish name

2011-01-07 Thread Bernard Secher - SFME/LGLS

I get the same dead lock with openmpi tests: pubsub, accept and connect with version 1.5.1 Bernard Secher - SFME/LGLS a écrit : Jeff, The dead lock is not in MPI_Comm_accept and MPI_Comm_connect, but before in MPI_Publish_name and MPI_Lookup_name. So the broadcast of srv is not involved in t

Re: [OMPI users] change between openmpi 1.4.1 and 1.5.1 about MPI2 publish name

2011-01-07 Thread Bernard Secher - SFME/LGLS

The accept and connect tests are OK with version openmpi 1.4.1. I think there is a bug in version 1.5.1 Best Bernard Bernard Secher - SFME/LGLS a écrit : I get the same dead lock with openmpi tests: pubsub, accept and connect with version 1.5.1 Bernard Secher - SFME/LGLS a écrit : Jeff, Th

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2011-01-07 Thread John Hearns

On 6 January 2011 21:10, Gilbert Grosdidier wrote: > Hi Jeff, > > Where's located lstopo command on SuseLinux, please ? > And/or hwloc-bind, which seems related to it ? I was able to get hwloc to install quite easily on SuSE - download/configure/make Configure it to install to /usr/local/bin A

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2011-01-07 Thread Jeff Squyres

On Jan 6, 2011, at 11:23 PM, Gilbert Grosdidier wrote: > > lstopo > Machine (35GB) > NUMANode L#0 (P#0 18GB) + Socket L#0 + L3 L#0 (8192KB) >L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0 > PU L#0 (P#0) > PU L#1 (P#8) >L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1 > PU L#2 (P#1) >

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2011-01-07 Thread Gilbert Grosdidier

Hi Jeff, Thanks for taking care of this. Here is what I got on a worker node: > mpirun --mca mpi_paffinity_alone 1 /opt/software/SGI/hwloc/ 1.1rc6r3028/bin/hwloc-bind --get 0x0001 Is this what is expected, please ? Or should I try yet another command ? Thanks, Regards, Gilbert

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2011-01-07 Thread Jeff Squyres

On Jan 7, 2011, at 5:27 AM, John Hearns wrote: > Actually, the topic of hyperthreading is interesting, and we should > discuss it please. > Hyperthreading is supposedly implemented better and 'properly' on > Nehalem - I would be interested to see some genuine > performance measurements with hypert

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2011-01-07 Thread Jeff Squyres

Can you run with np=8? On Jan 7, 2011, at 9:49 AM, Gilbert Grosdidier wrote: > Hi Jeff, > > Thanks for taking care of this. > > Here is what I got on a worker node: > > > mpirun --mca mpi_paffinity_alone 1 > > /opt/software/SGI/hwloc/1.1rc6r3028/bin/hwloc-bind --get > 0x0001 > > Is thi

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2011-01-07 Thread Gilbert Grosdidier

Yes, here it is : > mpirun -np 8 --mca mpi_paffinity_alone 1 /opt/software/SGI/hwloc/ 1.1rc6r3028/bin/hwloc-bind --get 0x0001 0x0002 0x0004 0x0008 0x0010 0x0020 0x0040 0x0080 Gilbert. Le 7 janv. 11 à 15:50, Jeff Squyres a écrit : Can you run with np=8? On J

Re: [OMPI users] Issue with : btl_openib.c (OMPI 1.4.3)

2011-01-07 Thread Shamis, Pavel

The FW version looks ok. But it may be driver issues as well. I guess that OFED 1.4.X or 1.5.x driver should be ok. To check driver version , you may run ofed_info command. Regards, Pavel (Pasha) Shamis --- Application Performance Tools Group Computer Science and Math Division Oak Ridge National

Re: [OMPI users] change between openmpi 1.4.1 and 1.5.1 about MPI2 publish name

2011-01-07 Thread Jeff Squyres

You're calling bcast with root=0, so whatever value rank 0 has for srv, everyone will have after the bcast. Plus, I didn't see in your code where *srv was ever set to 0. In my runs, rank 0 is usually the one that publishes first. Everyone then gets the lookup properly, and then the bcast send

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2011-01-07 Thread Tim Prince

On 1/7/2011 6:49 AM, Jeff Squyres wrote: My understanding is that hyperthreading can only be activated/deactivated at boot time -- once the core resources are allocated to hyperthreads, they can't be changed while running. Whether disabling the hyperthreads or simply telling Linux not to sche

Re: [OMPI users] Issue with : btl_openib.c (OMPI 1.4.3)

2011-01-07 Thread Jeff Squyres

+1 AFAIR (and I stopped being an IB vendor a long time ago, so I might be wrong), the _resize_cq function being there or not is not an issue of the underlying HCA; it's a function of what version of OFED you're running. On Jan 7, 2011, at 10:14 AM, Shamis, Pavel wrote: > The FW version looks

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2011-01-07 Thread Jeff Squyres

Well, bummer -- there goes my theory. According to the hwloc info you posted earlier, this shows that OMPI is binding to the 1st hyperthread on each core; *not* to both hyperthreads on a single core. :-\ It would still be slightly interesting to see if there's any difference when you run with

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2011-01-07 Thread Gilbert Grosdidier

I'll very soon give a try to using Hyperthreading with our app, and keep you posted about the improvements, if any. Our current cluster is made out of 4-core dual-socket Nehalem nodes. Cheers,Gilbert. Le 7 janv. 11 à 16:17, Tim Prince a écrit : On 1/7/2011 6:49 AM, Jeff Squyres wrote:

Re: [OMPI users] change between openmpi 1.4.1 and 1.5.1 about MPI2 publish name

2011-01-07 Thread Bernard Secher - SFME/LGLS

srv = 0 is set in my main program I call Bcast because all the processes must call MPI_Comm_accept (collective) or must call MPI_Comm_connect (collective) Anyway, I get also a dead lock with your lookup program: That's what I do: ompi-server -r URIfile mpirun -np 1 -ompi-server file:URIfile

Re: [OMPI users] Issue with : btl_openib.c (OMPI 1.4.3)

2011-01-07 Thread Gilbert Grosdidier

Bonjour Pavel, Here is the output of the ofed_info command : == OFED-1.4.1 libibverbs: git://git.openfabrics.org/ofed_1_4/libibverbs.git ofed_1_4 commit b00dc7d2f79e0660ac40160607c9c4937a895433 libmthca: git://git.kernel.org/pub/scm/libs/infiniban

Re: [OMPI users] srun and openmpi

2011-01-07 Thread Michael Di Domenico

I'm still testing the slurm integration, which seems to work fine so far. However, i just upgraded another cluster to openmpi-1.5 and slurm 2.1.15 but this machine has no infiniband if i salloc the nodes and mpirun the command it seems to run and complete fine however if i srun the command i get

Re: [OMPI users] change between openmpi 1.4.1 and 1.5.1 about MPI2 publish name

2011-01-07 Thread Jeff Squyres

On Jan 7, 2011, at 10:41 AM, Bernard Secher - SFME/LGLS wrote: > srv = 0 is set in my main program > I call Bcast because all the processes must call MPI_Comm_accept (collective) > or must call MPI_Comm_connect (collective) Ah -- I see. I thought this was a test program where some processes wer

Re: [OMPI users] change between openmpi 1.4.1 and 1.5.1 about MPI2 publish name

2011-01-07 Thread Jeff Squyres

On Jan 7, 2011, at 11:16 AM, Jeff Squyres wrote: > Ok, I can replicate the hang in publish now. I'll file a bug report. Filed here: https://svn.open-mpi.org/trac/ompi/ticket/2681 Thanks for your persistence! -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http:

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2011-01-07 Thread Gilbert Grosdidier

Unfortunately, I was unable to spot any striking difference in perfs when using --bind-to-core. Sorry. Any other suggestion ? Regards,Gilbert. Le 7 janv. 11 à 16:32, Jeff Squyres a écrit : Well, bummer -- there goes my theory. According to the hwloc info you posted earlier, this s

Re: [OMPI users] mpirun --nice 10 prog ??

2011-01-07 Thread David Mathog

Ralph Castain wrote: > Afraid not - though you could alias your program name to be "nice --10 prog" > Is there an OMPI wish list? If so, can we please add to it "a method to tell mpirun what nice values to use when it starts programs on nodes"? Minimally, something like this: --nice 12

Re: [OMPI users] mpirun --nice 10 prog ??

2011-01-07 Thread Eugene Loh

David Mathog wrote: Ralph Castain wrote: Afraid not - though you could alias your program name to be "nice --10 prog" Is there an OMPI wish list? If so, can we please add to it "a method to tell mpirun what nice values to use when it starts programs on nodes"? Minimall

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

2011-01-07 Thread Eugene Loh

Gilbert Grosdidier wrote: Any other suggestion ? Can any more information be extracted from profiling? Here is where I think things left off: Eugene Loh wrote: Gilbert Grosdidier wrote: # [time] [calls] <%mpi> <%wall> # MPI_Waitall

[OMPI users] MPI_Comm_Spawn intercommunication

2011-01-07 Thread Pierre Chanial

Hello, When I run this code: program testcase use mpi implicit none integer :: rank, lsize, rsize, code integer :: intercomm call MPI_INIT(code) call MPI_COMM_GET_PARENT(intercomm, code) if (intercomm == MPI_COMM_NULL) then call MPI_COMM_SPAWN ("./testcase"

Re: [OMPI users] change between openmpi 1.4.1 and 1.5.1 about MPI2 publish name

Re: [OMPI users] change between openmpi 1.4.1 and 1.5.1 about MPI2 publish name

Re: [OMPI users] change between openmpi 1.4.1 and 1.5.1 about MPI2 publish name

Re: [OMPI users] change between openmpi 1.4.1 and 1.5.1 about MPI2 publish name

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

Re: [OMPI users] Issue with : btl_openib.c (OMPI 1.4.3)

Re: [OMPI users] change between openmpi 1.4.1 and 1.5.1 about MPI2 publish name

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

Re: [OMPI users] Issue with : btl_openib.c (OMPI 1.4.3)

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

Re: [OMPI users] change between openmpi 1.4.1 and 1.5.1 about MPI2 publish name

Re: [OMPI users] Issue with : btl_openib.c (OMPI 1.4.3)

Re: [OMPI users] srun and openmpi

Re: [OMPI users] change between openmpi 1.4.1 and 1.5.1 about MPI2 publish name

Re: [OMPI users] change between openmpi 1.4.1 and 1.5.1 about MPI2 publish name

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

Re: [OMPI users] mpirun --nice 10 prog ??

Re: [OMPI users] mpirun --nice 10 prog ??

Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance

[OMPI users] MPI_Comm_Spawn intercommunication

26 matches

Site Navigation

Mail list logo

Footer information