[OMPI users] MPI_Gatherv error
Dear all, I am a beginner of MPI, right now I try to use MPI_GATHERV in my code, the test code just gather the value of array A to store them in array B, but I found an error listed as follows, 'Fatal error in MPI_Gatherv: Invalid count, error stack: PMPI_Gatherv<398>: MPI_Gatherv failed failed PMPI_Gatherv<317>: Negative count, value is -842150451’ Here I post my program with the email, I wonder anyone can help me to fix it or not? I guess my error is from the sending or receiving buffer and the displacement of the value stored, I tried to changed ‘B,jlen,idisp’ to ’ B(1,1), jlen(myid),idisp(myid)’ or other things, but I still cannot work it out. I am looking forward some help from you. Zhangping Wei my code is, PROGRAM MAIN IMPLICIT NONE INCLUDE 'mpif.h' INTEGER I,J,IWORK,JWORK,I1,I2,J1,J2 REAL A(16,16),B(16,16) INTEGER,ALLOCATABLE ::idisp(:),jlen(:) integer myid,numprocs,rc,ierr,istar,iend,jstar,jend integer status(MPI_STATUS_SIZE) CALL MPI_INIT(ierr) CALL MPI_COMM_RANK(MPI_COMM_WORLD,myid,ierr) CALL MPI_COMM_SIZE(MPI_COMM_WORLD,numprocs,ierr) ! PRINT *,'process ',myid, 'of',numprocs, 'is alive.' allocate(idisp(0:numprocs-1),jlen(0:numprocs-1)) DO J=1,16 DO I=1,16 A(I,J)=I+J B(I,J)=0.0 ENDDO ENDDO I1=1;I2=16;J1=1;J2=16 JWORK=(J2-J1)/numprocs+1 JSTAR=MIN(myid*JWORK+J1,J2+1) JEND=MIN(JSTAR+JWORK-1,J2) ISTAR=I1 IEND=I2 PRINT *,myid,istar,iend,jstar,jend jlen(myid)=16*(jend-jstar+1) idisp(myid)=16*(jstar-1) print *,myid,jlen(myid),idisp(myid) CALL MPI_GATHERV(A(1,jstar),jlen(myid),MPI_REAL, *B,jlen,idisp,MPI_REAL,0,MPI_COMM_WORLD,IERR) IF(myid.EQ.0)THEN DO J=1,16 DO I=1,16 PRINT *,I,J,B(I,J) ENDDO ENDDO ENDIF CALL MPI_Finalize(rc) END PROGRAM
Re: [OMPI users] btl_openib_cpc_include rdmacm questions
On Apr 21, 2011, at 4:41 PM, Brock Palen wrote: > Given that part of our cluster is TCP only, openib wouldn't even startup on > those hosts That is correct - it would have no impact on those hosts > and this would be ignored on hosts with IB adaptors? Ummm...not sure I understand this one. The param -will- be used on hosts with IB adaptors because that is what it is controlling. However, it -won't- have any impact on hosts without IB adaptors, which is what I suspect you meant to ask? > > Just checking thanks! > > Brock Palen > www.umich.edu/~brockp > Center for Advanced Computing > bro...@umich.edu > (734)936-1985 > > > > On Apr 21, 2011, at 6:21 PM, Jeff Squyres wrote: > >> Over IB, I'm not sure there is much of a drawback. It might be slightly >> slower to establish QP's, but I don't think that matters much. >> >> Over iWARP, rdmacm can cause connection storms as you scale to thousands of >> MPI processes. >> >> >> On Apr 20, 2011, at 5:03 PM, Brock Palen wrote: >> >>> We managed to have another user hit the bug that causes collectives (this >>> time MPI_Bcast() ) to hang on IB that was fixed by setting: >>> >>> btl_openib_cpc_include rdmacm >>> >>> My question is if we set this to the default on our system with an >>> environment variable does it introduce any performance or other issues we >>> should be aware of? >>> >>> Is there a reason we should not use rdmacm? >>> >>> Thanks! >>> >>> Brock Palen >>> www.umich.edu/~brockp >>> Center for Advanced Computing >>> bro...@umich.edu >>> (734)936-1985 >>> >>> >>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] btl_openib_cpc_include rdmacm questions
Given that part of our cluster is TCP only, openib wouldn't even startup on those hosts and this would be ignored on hosts with IB adaptors? Just checking thanks! Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Apr 21, 2011, at 6:21 PM, Jeff Squyres wrote: > Over IB, I'm not sure there is much of a drawback. It might be slightly > slower to establish QP's, but I don't think that matters much. > > Over iWARP, rdmacm can cause connection storms as you scale to thousands of > MPI processes. > > > On Apr 20, 2011, at 5:03 PM, Brock Palen wrote: > >> We managed to have another user hit the bug that causes collectives (this >> time MPI_Bcast() ) to hang on IB that was fixed by setting: >> >> btl_openib_cpc_include rdmacm >> >> My question is if we set this to the default on our system with an >> environment variable does it introduce any performance or other issues we >> should be aware of? >> >> Is there a reason we should not use rdmacm? >> >> Thanks! >> >> Brock Palen >> www.umich.edu/~brockp >> Center for Advanced Computing >> bro...@umich.edu >> (734)936-1985 >> >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > >
Re: [OMPI users] Need help buiding OpenMPI with Intel v12.0 compilers on Linux
On Apr 20, 2011, at 10:44 AM, Ormiston, Scott J. wrote: > I originally thought the configure was fine, but now tht I check through the > config.log, I see that it had errors: > > conftest.c(49): error #2379: cannot open source file "ac_nonexistent.h" > #include It's normal and expected for there to be lots of errors in config.log. There's a bunch of tests in configure that are designed to succeed on some systems and fail on others. So don't read anything into the failures that you see in config.log -- unless configure itself fails. Then we generally go look at the *last* failures in config.log to start backtracking to figure out what went wrong. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] using openib and psm together
I believe it was mainly a startup issue -- there's a complicated sequence of events that happens during MPI_INIT. IIRC, the issue was that if OMPI had software support for PSM, it assumed that the lack of PSM hardware was effectively an error. v1.5 made the startup sequence a little more flexible; the PSM bits in OMPI can say "Oh yes, we have PSM support, but I don't see any PSM hardware, so just ignore me... please move along... nothing to see here..." OMPI's openib BTL has had this kind of support for a long time, but PSM and verbs are treated a little differently in the startup sequence because they're fundamentally different kinds of transports (abstraction-wise, anyway). On Apr 21, 2011, at 6:01 AM, Dave Love wrote: > We have an installation with both Mellanox and Qlogic IB adaptors (in > distinct islands), so I built open-mpi 1.4.3 with openib and psm > support. > > Now I've just read this in the OFED source, but I can't see any relevant > issue in the open-mpi tracker: > > OpenMPI support > --- > It is recommended to use the OpenMPI v1.5 development branch. Prior versions > of OpenMPI have an issue with support PSM network transports mixed with > standard > Verbs transport (BTL openib). This prevents an OpenMPI installation with > network modules available for PSM and Verbs to work correctly on nodes with > no QLogic IB hardware. This has been fixed in the latest development branch > allowing a single OpenMPI installation to target IB hardware via PSM or Verbs > as well as alternate transports seamlessly. > > Do I definitely need 1.5 (and is 1.5.3 good enough?) to have openib and > psm working correctly? Also what are the symptoms of it not working > correctly? > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] huge VmRSS on rank 0 after MPI_Init when using "btl_openib_receive_queues" option
Does it vary exactly according to your receive_queues specification? On Apr 19, 2011, at 9:03 AM, Eloi Gaudry wrote: > hello, > > i would like to get your input on this: > when launching a parallel computation on 128 nodes using openib and the "-mca > btl_openib_receive_queues P,65536,256,192,128" option, i observe a rather > large resident memory consumption (2GB: 65336*256*128) on the process with > rank 0 (and only this process) just after a call to MPI_Init. > > i'd like to know why the other processes doesn't behave the same: > - other processes located on the same nodes don't use that amount of memory > - all others processes (i.e. located on any other nodes) neither > > i'm using OpenMPI-1.4.2, built with gcc-4.3.4 and '--enable-cxx-exceptions > --with-pic --with-threads=posix' options. > > thanks for your help, > éloi > > -- > Eloi Gaudry > Senior Product Development Engineer > > Free Field Technologies > Company Website: http://www.fft.be > Direct Phone Number: +32 10 495 147 > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] btl_openib_cpc_include rdmacm questions
Over IB, I'm not sure there is much of a drawback. It might be slightly slower to establish QP's, but I don't think that matters much. Over iWARP, rdmacm can cause connection storms as you scale to thousands of MPI processes. On Apr 20, 2011, at 5:03 PM, Brock Palen wrote: > We managed to have another user hit the bug that causes collectives (this > time MPI_Bcast() ) to hang on IB that was fixed by setting: > > btl_openib_cpc_include rdmacm > > My question is if we set this to the default on our system with an > environment variable does it introduce any performance or other issues we > should be aware of? > > Is there a reason we should not use rdmacm? > > Thanks! > > Brock Palen > www.umich.edu/~brockp > Center for Advanced Computing > bro...@umich.edu > (734)936-1985 > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Removing Portals BTLs
Sure - instead of what you did, just add --without-portals to your original configure. The exact option depends on what portals you have installed. Here is the relevant part of the "./configure -h" output: --with-portals=DIR Specify the installation directory of PORTALS --with-portals-libs=LIBS Libraries to link with for portals --with-portals4(=DIR) Build Portals4 support, optionally adding DIR/include, DIR/lib, and DIR/lib64 to the search path for headers and libraries --with-portals4-libdir=DIR Search for Portals4 libraries in DIR Just do --without-portals or --without-portals4 (you don't need the matching libdir option), whichever matches what you have. On Apr 21, 2011, at 11:34 AM, Paul Monday wrote: > Hi, > > I am trying to get rid of the following error message when I use mpirun. > > mca: base: component_find: "mca_ess_portals_utcp" does not appear to be a > valid > ess MCA dynamic component (ignored): > /usr/local/lib/openmpi/mca_ess_portals_utcp.so: undefined symbol: > mca_ess_portals_utcp_component > > I am trying to remove the portals components altogether...here's why: > > When I originally built openmpi, I used a simple configuration string: > ./configure --with-threads=posix --enable-mpi-threads --with-openib=/usr > --with-openib-libdir=/usr/lib64 --disable-mpi-cxx > > This gives me an error while the make is running, most likely a problem with > my > Portals installation. So, I just want to skip Portals BTLs. > /usr/bin/ld: /usr/local/lib/libp3api.a(libp3api_a-acl.o): relocation > R_X86_64_32S against `p3_api_process' can not be used when making a shared > object; recompile with -fPIC > /usr/local/lib/libp3api.a: could not read symbols: Bad value > collect2: ld returned 1 exit status > make[2]: *** [libmpi.la] Error 1 > make[2]: Leaving directory `/mnt/shared/apps/openmpi-1.4.3/ompi' > make[1]: *** [all-recursive] Error 1 > make[1]: Leaving directory `/mnt/shared/apps/openmpi-1.4.3/ompi' > make: *** [all-recursive] Error 1 > > So I changed the configuration to: > ./configure --with-threads=posix --enable-mpi-threads --with-openib=/usr > --with-openib-libdir=/usr/lib64 --disable-mpi-cxx > --enable-mca-no-build=btl-portals,ess-portals_utcp,common-portals,mtl-portals > > This allowed OpenMPI to build, but then I receive the runtime error above. Is > there a way to stop the Portals pieces from even trying to build and run? > > Paul Monday > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Removing Portals BTLs
Hi, I am trying to get rid of the following error message when I use mpirun. mca: base: component_find: "mca_ess_portals_utcp" does not appear to be a valid ess MCA dynamic component (ignored): /usr/local/lib/openmpi/mca_ess_portals_utcp.so: undefined symbol: mca_ess_portals_utcp_component I am trying to remove the portals components altogether...here's why: When I originally built openmpi, I used a simple configuration string: ./configure --with-threads=posix --enable-mpi-threads --with-openib=/usr --with-openib-libdir=/usr/lib64 --disable-mpi-cxx This gives me an error while the make is running, most likely a problem with my Portals installation. So, I just want to skip Portals BTLs. /usr/bin/ld: /usr/local/lib/libp3api.a(libp3api_a-acl.o): relocation R_X86_64_32S against `p3_api_process' can not be used when making a shared object; recompile with -fPIC /usr/local/lib/libp3api.a: could not read symbols: Bad value collect2: ld returned 1 exit status make[2]: *** [libmpi.la] Error 1 make[2]: Leaving directory `/mnt/shared/apps/openmpi-1.4.3/ompi' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/mnt/shared/apps/openmpi-1.4.3/ompi' make: *** [all-recursive] Error 1 So I changed the configuration to: ./configure --with-threads=posix --enable-mpi-threads --with-openib=/usr --with-openib-libdir=/usr/lib64 --disable-mpi-cxx --enable-mca-no-build=btl-portals,ess-portals_utcp,common-portals,mtl-portals This allowed OpenMPI to build, but then I receive the runtime error above. Is there a way to stop the Portals pieces from even trying to build and run? Paul Monday
Re: [OMPI users] Bug in MPI_scatterv Fortran-90 implementation
I do believe you found a bona-fide bug. Could you try the attached patch? (I think it should only affect f90 "large" builds) You should be able to check it quickly via: cd top_of_ompi_source_tree patch -p0 < scatterv-f90.patch cd ompi/mpi/f90 make clean rm mpi_scatterv_f90.f90 make all install On Apr 21, 2011, at 10:37 AM, Stanislav Sazykin wrote: > Hello, > > I came across what appears to be an error in implementation of > MPI_scatterv Fortran-90 version. I am using OpenMPI 1.4.3 on Linux. > This comes up when OpenMPI was configured with > --with-mpi-f90-size=medium or --with-mpi-f90-size=large > > The standard specifies that the interface is > MPI_SCATTERV(SENDBUF, SENDCOUNTS, DISPLS, SENDTYPE, RECVBUF, >RECVCOUNT, RECVTYPE, ROOT, COMM, IERROR) >SENDBUF(*), RECVBUF(*) >INTEGERSENDCOUNTS(*), DISPLS(*), SENDTYPE > > so that SENDCOUNTS and DISPLS are integer arrays. However, if > I compile a fortran code with calls to MPI_scatterv and compile > with argument checks, two Fortran compilers (Intel and Lahey) > produce fatal errors saying there is no matching interface. > > Looking in the source code of OpenMPI, I see that in > ompi/mpi/f90/scripts, the script mpi_scatterv_f90.f90.sh that > is invoked when running "make" produces Fortran interfaces > that list both SENDCOUNTS and DISPLS as > > integer, intent(in) :: > > This appears to be an error as it would be illegal to pass a scalar > variable and receive it as an array in the subroutine. I have not > figured out what happens in the code at this invocation (the code > is complicated), but seems like a segfault situation. > > -- > Stan Sazykin > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ scatterv-f90.patch Description: Binary data
[OMPI users] Bug in MPI_scatterv Fortran-90 implementation
Hello, I came across what appears to be an error in implementation of MPI_scatterv Fortran-90 version. I am using OpenMPI 1.4.3 on Linux. This comes up when OpenMPI was configured with --with-mpi-f90-size=medium or --with-mpi-f90-size=large The standard specifies that the interface is MPI_SCATTERV(SENDBUF, SENDCOUNTS, DISPLS, SENDTYPE, RECVBUF, RECVCOUNT, RECVTYPE, ROOT, COMM, IERROR) SENDBUF(*), RECVBUF(*) INTEGERSENDCOUNTS(*), DISPLS(*), SENDTYPE so that SENDCOUNTS and DISPLS are integer arrays. However, if I compile a fortran code with calls to MPI_scatterv and compile with argument checks, two Fortran compilers (Intel and Lahey) produce fatal errors saying there is no matching interface. Looking in the source code of OpenMPI, I see that in ompi/mpi/f90/scripts, the script mpi_scatterv_f90.f90.sh that is invoked when running "make" produces Fortran interfaces that list both SENDCOUNTS and DISPLS as integer, intent(in) :: This appears to be an error as it would be illegal to pass a scalar variable and receive it as an array in the subroutine. I have not figured out what happens in the code at this invocation (the code is complicated), but seems like a segfault situation. -- Stan Sazykin
[OMPI users] using openib and psm together
We have an installation with both Mellanox and Qlogic IB adaptors (in distinct islands), so I built open-mpi 1.4.3 with openib and psm support. Now I've just read this in the OFED source, but I can't see any relevant issue in the open-mpi tracker: OpenMPI support --- It is recommended to use the OpenMPI v1.5 development branch. Prior versions of OpenMPI have an issue with support PSM network transports mixed with standard Verbs transport (BTL openib). This prevents an OpenMPI installation with network modules available for PSM and Verbs to work correctly on nodes with no QLogic IB hardware. This has been fixed in the latest development branch allowing a single OpenMPI installation to target IB hardware via PSM or Verbs as well as alternate transports seamlessly. Do I definitely need 1.5 (and is 1.5.3 good enough?) to have openib and psm working correctly? Also what are the symptoms of it not working correctly?