Re: [OMPI users] Flow control in OMPI

2011-09-13 Thread Rodrigo Silva Oliveira
>> Are you overwhelming the receiver with short, unexpected messages such that MPI keeps mallocing >> and mallocing and mallocing in an attempt to eagerly receive all the messages? I ask because Open >> MPI only eagerly sends short messages -- long messages are queued up at the sender and not >> a

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Blosch, Edwin L
We don't budget computer hours so I don't think we would use accounting, although I'm not sure I know what this capability is all about. Also, I don't care about launch speed. A few minutes means nothing when the job will take days to run. Also, I have a highly portable strategy of wrapping the

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Reuti
Am 14.09.2011 um 00:29 schrieb Ralph Castain: > > On Sep 13, 2011, at 4:25 PM, Reuti wrote: > >> Am 13.09.2011 um 23:54 schrieb Blosch, Edwin L: >> >>> This version of OpenMPI I am running was built without any guidance >>> regarding SGE in the configure command, but it was built on a system t

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Reuti
Am 14.09.2011 um 00:25 schrieb Blosch, Edwin L: > Your comment guided me in the right direction, Reuti. And overlapped with > your guidance, Ralph. > > It works: if I add this flag then it runs > --mca plm_rsh_disable_qrsh > > Thank you both for the explanations. > > I had built OpenMPI on a

Re: [OMPI users] OpenIB error messages: reporting the default or telling you what's happening?

2011-09-13 Thread Kevin . Buckley
> So the error output is not showing what you two think should be > the default value, 20, but then nor is it showing what I think I > have set it to globally, again, 20. > > But anyroad, what I wanted from this is confirmation that the output > is telling me the value that the job was running with

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Ralph Castain
Just to clarify: you'll still need to set that variable regardless of --without-sge or not. The launcher will still use qrsh if it is present and the SGE envars are around. On Sep 13, 2011, at 4:25 PM, Blosch, Edwin L wrote: > Your comment guided me in the right direction, Reuti. And overlapped

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Ralph Castain
On Sep 13, 2011, at 4:25 PM, Reuti wrote: > Am 13.09.2011 um 23:54 schrieb Blosch, Edwin L: > >> This version of OpenMPI I am running was built without any guidance >> regarding SGE in the configure command, but it was built on a system that >> did not have SGE, so I would presume support is a

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Blosch, Edwin L
Your comment guided me in the right direction, Reuti. And overlapped with your guidance, Ralph. It works: if I add this flag then it runs --mca plm_rsh_disable_qrsh Thank you both for the explanations. I had built OpenMPI on another system, as I said, it did not have SGE and thus I did not g

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Reuti
Am 13.09.2011 um 23:54 schrieb Blosch, Edwin L: > This version of OpenMPI I am running was built without any guidance regarding > SGE in the configure command, but it was built on a system that did not have > SGE, so I would presume support is absent. Whether SGE is installed on the built machi

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Ralph Castain
On Sep 13, 2011, at 4:15 PM, Reuti wrote: > Am 14.09.2011 um 00:11 schrieb Ralph Castain: > >> I believe this is one of those strange cases that can catch us. The problem >> is that we still try to use the qrsh launcher - we appear to ignore the >> --without-sge configure option (it impacts ou

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Reuti
Am 14.09.2011 um 00:11 schrieb Ralph Castain: > I believe this is one of those strange cases that can catch us. The problem > is that we still try to use the qrsh launcher - we appear to ignore the > --without-sge configure option (it impacts our ability to read the > allocation, but not the la

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Ralph Castain
I believe this is one of those strange cases that can catch us. The problem is that we still try to use the qrsh launcher - we appear to ignore the --without-sge configure option (it impacts our ability to read the allocation, but not the launcher). Try setting the following: -mca plm_rsh_disa

Re: [OMPI users] EXTERNAL: Re: Problem running under SGE

2011-09-13 Thread Blosch, Edwin L
This version of OpenMPI I am running was built without any guidance regarding SGE in the configure command, but it was built on a system that did not have SGE, so I would presume support is absent. My hope is that OpenMPI will not attempt to use SGE in any way. But perhaps it is trying to. Ye

Re: [OMPI users] Problem running under SGE

2011-09-13 Thread Reuti
Am 13.09.2011 um 23:18 schrieb Blosch, Edwin L: > I’m able to run this command below from an interactive shell window: > > /bin/mpirun --machinefile mpihosts.dat –np 16 –mca plm_rsh_agent > /usr/bin/rsh –x MPI_ENVIRONMENT=1 ./test_setup > > but it does not work if I put it into a shell script

Re: [OMPI users] Problem running under SGE

2011-09-13 Thread Reuti
Am 13.09.2011 um 23:18 schrieb Blosch, Edwin L: > I’m able to run this command below from an interactive shell window: > > /bin/mpirun --machinefile mpihosts.dat –np 16 –mca plm_rsh_agent > /usr/bin/rsh –x MPI_ENVIRONMENT=1 ./test_setup > > but it does not work if I put it into a shell script

[OMPI users] Problem running under SGE

2011-09-13 Thread Blosch, Edwin L
I'm able to run this command below from an interactive shell window: /bin/mpirun --machinefile mpihosts.dat -np 16 -mca plm_rsh_agent /usr/bin/rsh -x MPI_ENVIRONMENT=1 ./test_setup but it does not work if I put it into a shell script and 'qsub' that script to SGE. I get the message shown at th

Re: [OMPI users] Problem compiling openmpi-1.4.3

2011-09-13 Thread Gus Correa
Hi Amos Do you mean './configure' instead of './compile'? Also, not sure if LIBDIRS is used by the OpenMPI configure script. The second error (cannot load libimf.so) may be because you need to set your Intel compiler environment environment. It is easier to put it in your .cshrc/.bashrc file. So

Re: [OMPI users] Problem compiling openmpi-1.4.3

2011-09-13 Thread Rayson Ho
Did you notice the error message: /usr/bin/install: cannot remove `/opt/openmpi/share/openmpi/amca-param-sets/example.conf': Permission denied I would check the permission settings of the file first if I encounter something like this... Rayson = Grid Engine / O

[OMPI users] Problem compiling openmpi-1.4.3

2011-09-13 Thread amosl...@gmail.com
Dear Users, I have run into a problem trying to compile openmpi-1.4.3. I am running SuSE Linux 11.4 in VMware-7.0.1. For compilers I am using l_fcompxe_intel64_2011.5.220 and l_ccompxe_intel64_2011.5.220 which are newly issued. It appears to go through the compile command:

Re: [OMPI users] OpenMPI Nonblocking Send/Recv

2011-09-13 Thread Rayson Ho
Hi Xin, Since it is not Open MPI specific, you might want to try to work with the SciNet guys first. The "SciNet Research Computing Consulting Clinic" is specifically formed to help U of T students & researchers develop and design compute-intensive programs. http://www.scinet.utoronto.ca/ http://

[OMPI users] OpenMPI Nonblocking Send/Recv

2011-09-13 Thread Xin Tong Utoronto
I am new to openmpi. I am not sure whether my logic below will work or not. Can someone please confirm for me on that ? Basically, what this does is trying to check whether there are anything to send, if there are, send it right away and set sentinit to true. Then check whether there are anything t

Re: [OMPI users] EXTERNAL: Re: Question on using rsh

2011-09-13 Thread Ralph Castain
My bad - I thought you were talking about the 1.5 series :-( orte_rsh_agent does not exist in the 1.4 series. Check "ompi_info --param plm rsh" and you'll see that it isn't there. So plm_rsh_agent is picking up your request. The other cmd line blissfully ignores the orte_rsh_agent param. On S

Re: [OMPI users] EXTERNAL: Re: Question on using rsh

2011-09-13 Thread Blosch, Edwin L
Ralph, Reuti, There are no typos, except in my post itself where I clipped out a few arguments. I just repeated the exercise this morning, exactly like this: /bin/mpirun --machinefile mpihosts.dat -np 16 -mca orte_rsh_agent /usr/bin/rsh -x MPI_ENVIRONMENT=1 ./test_setup It prompts for a passw

Re: [OMPI users] IO issue with OpenMPI 1.4.1 and earlier versions

2011-09-13 Thread Rob Latham
On Mon, Sep 12, 2011 at 07:44:25PM -0700, Steve Jones wrote: > Hi. > > We've run into an IO issue with 1.4.1 and earlier versions. We're > able to reproduce the issue in around 120 lines of code to help. Hi Steve Jones I'm the ROMIO maintainer, and always looking for ways to improve our test co

Re: [OMPI users] Question on MPI_Ssend

2011-09-13 Thread Jeff Squyres
On Sep 13, 2011, at 8:41 AM, devendra rai wrote: > Also, I read the definition of MPI_Ssend(...) that you sent, but then it does > not explain why both MPI_Ssend(...) and MPI_Recv(...) are blocked seemingly > forever. Oh, they're blocked *forever*! Sorry; I didn't get that from your prior de

Re: [OMPI users] Question on MPI_Ssend

2011-09-13 Thread devendra rai
Hello Jeff and MPI Users, Yes, I know about the ISends. I do not want to have ISends for other reasons. The problem that bothers me is that I have one process waiting on MPI_Recv(...) and the other process on MPI_Ssend(...), but still both are blocked. This happens arbitrarily. At other ti

Re: [OMPI users] IO issue with OpenMPI 1.4.1 and earlier versions

2011-09-13 Thread Jeff Squyres
On Sep 12, 2011, at 10:44 PM, Steve Jones wrote: > We've run into an IO issue with 1.4.1 and earlier versions. We're able to > reproduce the issue in around 120 lines of code to help, I'd like to find if > there's something we're simply doing incorrectly with the build or if it's in > fact a kn

Re: [OMPI users] #cpus/socket

2011-09-13 Thread nn3003
Thanks for info. I was thinking it could be some wrong interpretation of per cpu core count. I will try newer library.       __ Od: "Brice Goglin" Komu: Open MPI Users Dátum: 13.09.2011 13:28 Predmet: Re: [OMPI users] #cpus/socket L

Re: [OMPI users] btl_openib_ipaddr_include broken in 1.4.4rc2?

2011-09-13 Thread Jeff Squyres
Mike -- This fix has since been included in rc3. Can you confirm that it is working for you? On Sep 4, 2011, at 2:36 AM, Yevgeny Kliteynik wrote: > On 30-Aug-11 4:50 PM, Michael Shuey wrote: >> I'm using RoCE (or rather, attempting to) and need to select a >> non-default GID to get my traffic

Re: [OMPI users] #cpus/socket

2011-09-13 Thread Brice Goglin
Le 13/09/2011 18:59, Peter Kjellström a écrit : > On Tuesday, September 13, 2011 09:07:32 AM nn3003 wrote: >> Hello ! >> >> I am running wrf model on 4x AMD 6172 which is 12 core CPU. I use OpenMPI >> 1.4.3 and libgomp 4.3.4. I have binaries compiled for shared-memory and >> distributed-memory (O

Re: [OMPI users] Question on MPI_Ssend

2011-09-13 Thread Jeff Squyres
On Sep 13, 2011, at 5:02 AM, devendra rai wrote: > I am using MPI_Ssend and a corresponding a MPI_Recv. I notice that whenever > MPI_Recv starts waiting firs, and then MPI_Ssend is posted, the MPI calls > just block. This, of course results in non-coherent application behavior. I'm not sure wh

Re: [OMPI users] #cpus/socket

2011-09-13 Thread Peter Kjellström
On Tuesday, September 13, 2011 09:07:32 AM nn3003 wrote: > Hello ! > > I am running wrf model on 4x AMD 6172 which is 12 core CPU. I use OpenMPI > 1.4.3 and libgomp 4.3.4. I have binaries compiled for shared-memory and > distributed-memory (OpenMP and OpenMPI) I use following command > mpirun -np

[OMPI users] Question on MPI_Ssend

2011-09-13 Thread devendra rai
Hello Everyone, I have a rather large network application, which runs on cluster of Linux machines. I am using MPI_Ssend and a corresponding a MPI_Recv. I notice that whenever MPI_Recv starts waiting firs, and then MPI_Ssend is posted, the MPI calls just block. This, of course results in non-c

Re: [OMPI users] mpiexec option for node failure

2011-09-13 Thread Reuti
Am 13.09.2011 um 02:43 schrieb Ralph Castain: > We don't have anything similar in OMPI. There are fault tolerance modes, but > not like the one you describe. You can join mpi3-ft at http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft, there is also an archive http://lists.mpi-forum.org/mpi

Re: [OMPI users] Question on using rsh

2011-09-13 Thread Reuti
Am 13.09.2011 um 02:42 schrieb Ralph Castain: > The two are synonyms for each other - they resolve to the identical variable, > so there isn't anything different about them. > > Not sure what the issue might be, but I would check for a typo - we don't > check that mca params are spelled correct

[OMPI users] #cpus/socket

2011-09-13 Thread nn3003
Hello !   I am running wrf model on 4x AMD 6172 which is 12 core CPU. I use OpenMPI 1.4.3 and libgomp 4.3.4. I have binaries compiled for shared-memory and distributed-memory (OpenMP and OpenMPI) I use following command mpirun -np 4 --cpus-per-proc 6 --report-bindings --bysocket wrf.exe It works

Re: [OMPI users] OpenIB error messages: reporting the default or telling you what's happening?

2011-09-13 Thread Kevin . Buckley
Pasha writes > > Actually I'm surprised that default value is 10. I think it > > used to be 20 Jeff writes: > FWIW, the default for the ib_timeout is 20 in both v1.4.x and v1.5.x. > > As Ralph said, ompi_info will show the current value -- not the default > value. Of course, the current v