[OMPI users] --mca btl params
What are the allowable values for the –mca btl parameter on the mpirun command line? – Jeff Jeffrey A. Cummings Engineering Specialist Mission Analysis and Operations Department Systems Analysis and Simulation Subdivision Systems Engineering Division Engineering and Technology Group The Aerospace Corporation 571-304-7548 jeffrey.a.cummi...@aero.org From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Andy Riebs Sent: Tuesday, October 09, 2018 2:34 PM To: users@lists.open-mpi.org Subject: Re: [OMPI users] no openmpi over IB on new CentOS 7 system Noam, Start with the FAQ, etc., under "Getting Help/Support" in the left-column menu at https://www.open-mpi.org/ Andy From: Noam Bernstein <mailto:noam.bernst...@nrl.navy.mil> Sent: Tuesday, October 09, 2018 2:26PM To: Open Mpi Users <mailto:users@lists.open-mpi.org> Cc: Subject: [OMPI users] no openmpi over IB on new CentOS 7 system Hi - I’m trying to get OpenMPI working on a newly configured CentOS 7 system, and I’m not even sure what information would be useful to provide. I’m using the CentOS built in libibverbs and/or libfabric, and I configure openmpi with just —with-verbs —with-ofi —prefix=$DEST also tried —without-ofi, no change. Basically, I can run with “—mca btl self,vader”, but if I try “—mca btl,openib” I get an error from each process: [compute-0-0][[24658,1],5][connect/btl_openib_connect_udcm.c:1245:udcm_rc_qp_to_rtr] error modifing QP to RTR errno says Invalid argument If I don’t specify the btl it appears to try to set up openib with the same errors, then crashes on some free() related segfault, presumably when it tries to actually use vader. The machine seems to be able to see its IB interface, as reported by things like ibstatus or ibv_devinfo. I’m not sure what else to look for. I also confirmed that “ulimit -l” reports unlimited. Does anyone have any suggestions as to how to diagnose this issue? thanks, Noam ___ users mailing list users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] OpenMPI v3.0 on Cygwin
The MS-MPI developers disagree with your statement below and claim to be actively working on MPI-3.0 compliance, with an expected new version release about every six month. - Jeff Jeffrey A. Cummings Engineering Specialist Performance Modeling and Analysis Department Systems Analysis and Simulation Subdivision Systems Engineering Division Engineering and Technology Group The Aerospace Corporation 571-304-7548 jeffrey.a.cummi...@aero.org -Original Message- From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Llelan D. Sent: Wednesday, September 27, 2017 3:22 PM To: users@lists.open-mpi.org Subject: Re: [OMPI users] OpenMPI v3.0 on Cygwin On 09/27/2017 4:36 AM, Marco Atzeri wrote: > After I finish on 2.1.2 I will look on 3.0. Thank you for your response. I am looking forward to a Cygwin release. If you could send me some guidelines as to the preferred manner of doing this as was done with previous versions, I could work on it myself. The 1.10 Cygport version compiles and packages just fine so I'll look at what was done to that for now and try to translate it to v3.0. @Jeffrey A Cummings: OpenMPI has abandoned *NATIVE* windows support (*.lib for compilers like MSVC, etc.) and not Cygwin, though a v2.* release has been slow. The msmpi package has never fully supported the MPI specification (or even come close) and has been long (and silently) abandoned by MS as people have preferred other fully compliant implementations. ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] OpenMPI v3.0 on Cygwin
The OpenMPI developers stopped supporting Windows a long time (several major versions) ago. Microsoft has a free version of MPI for Windows available for download. There's no guarantee it will be free forever but it is free for now. I've been using it for about a year and it works for me. My usage of MPI is pretty vanilla so there may be features missing in Microsoft's implementation. It might be worth your while to at least look at it. – Jeff -Original Message- From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Llelan D. Sent: Wednesday, September 27, 2017 3:31 AM To: users@lists.open-mpi.org Subject: [OMPI users] OpenMPI v3.0 on Cygwin Can OpenMPI v3.0 be compiled for Cygwin64 on Windows 10? Using: ./congifure --prefix=/usr/local [blah, blah... Apparently successful (At least it doesn't say there's an error)] make -j 12 all I'm getting a slew of compiler errors about redefinitions between: /usr/include/w32api/psdk_inc/_ip_types.h or /usr/include/w32api/winsock2.h and/usr/include/netdb.h or /usr/include/sys/socket.h Are there magic variables, definitions, or switches for a Cygwin build I'm missing? ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] MPI_Finalize?
What does MPI_Finalize actually do? Would it be harmful to synchronize all processes with a call to MPI_Barrier and then just exit, i.e., without calling MPI_Finalize? I’m asking because I’m getting a segmentation error in MPI_Finalize. – Jeff Jeffrey A. Cummings Engineering Specialist Performance Modeling and Analysis Department Systems Analysis and Simulation Subdivision Systems Engineering Division Engineering and Technology Group The Aerospace Corporation 571-304-7548 jeffrey.a.cummi...@aero.org ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] Problem moving from 1.4 to 1.6
I've tried that parameter, but with the order self,sm,tcp. Should this behave differently than the order you suggest? Jeffrey A. Cummings Engineering Specialist Performance Modeling and Analysis Department Systems Analysis and Simulation Subdivision Systems Engineering Division Engineering and Technology Group The Aerospace Corporation 571-307-4220 jeffrey.a.cummi...@aero.org From: Ralph Castain <r...@open-mpi.org> To: Open MPI Users <us...@open-mpi.org>, List-Post: users@lists.open-mpi.org Date: 06/30/2014 02:13 PM Subject:Re: [OMPI users] Problem moving from 1.4 to 1.6 Sent by:"users" <users-boun...@open-mpi.org> Yeah, this was built with a bunch of stuff you don't want. Are you trying to just run with TCP and not Infiniband? If so, then you want mpirun -mca btl tcp,sm,self and the problem should go away. On Jun 30, 2014, at 11:06 AM, Jeffrey A Cummings < jeffrey.a.cummi...@aero.org> wrote: Output from ompi_info: Package: Open MPI root@centos-6-5-x86-64-1.localdomain Distribution Open MPI: 1.6.2 Open MPI SVN revision: r27344 Open MPI release date: Sep 18, 2012 Open RTE: 1.6.2 Open RTE SVN revision: r27344 Open RTE release date: Sep 18, 2012 OPAL: 1.6.2 OPAL SVN revision: r27344 OPAL release date: Sep 18, 2012 MPI API: 2.1 Ident string: 1.6.2 Prefix: /opt/openmpi Configured architecture: x86_64-unknown-linux-gnu Configure host: centos-6-5-x86-64-1.localdomain Configured by: root Configured on: Mon Apr 14 02:23:27 PDT 2014 Configure host: centos-6-5-x86-64-1.localdomain Built by: root Built on: Mon Apr 14 02:33:37 PDT 2014 Built host: centos-6-5-x86-64-1.localdomain C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: gcc C compiler absolute: /usr/bin/gcc C compiler family name: GNU C compiler version: 4.4.7 C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fortran77 compiler: gfortran Fortran77 compiler abs: /usr/bin/gfortran Fortran90 compiler: gfortran Fortran90 compiler abs: /usr/bin/gfortran C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no) Sparse Groups: no Internal debug support: no MPI interface warnings: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes Heterogeneous support: no mpirun default --prefix: no MPI I/O support: yes MPI_WTIME support: gettimeofday Symbol vis. support: yes Host topology support: yes MPI extensions: affinity example FT Checkpoint support: no (checkpoint thread: no) VampirTrace support: yes MPI_MAX_PROCESSOR_NAME: 256 MPI_MAX_ERROR_STRING: 256 MPI_MAX_OBJECT_NAME: 64 MPI_MAX_INFO_KEY: 36 MPI_MAX_INFO_VAL: 256 MPI_MAX_PORT_NAME: 1024 MPI_MAX_DATAREP_STRING: 128 MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.6.2) MCA memory: linux (MCA v2.0, API v2.0, Component v1.6.2) MCA paffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.2) MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.6.2) MCA carto: file (MCA v2.0, API v2.0, Component v1.6.2) MCA shmem: mmap (MCA v2.0, API v2.0, Component v1.6.2) MCA shmem: posix (MCA v2.0, API v2.0, Component v1.6.2) MCA shmem: sysv (MCA v2.0, API v2.0, Component v1.6.2) MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.6.2) MCA maffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.2) MCA timer: linux (MCA v2.0, API v2.0, Component v1.6.2) MCA installdirs: env (MCA v2.0, API v2.0, Component v1.6.2) MCA installdirs: config (MCA v2.0, API v2.0, Component v1.6.2) MCA sysinfo: linux (MCA v2.0, API v2.0, Component v1.6.2) MCA hwloc: hwloc132 (MCA v2.0, API v2.0, Component v1.6.2) MCA dpm: orte (MCA v2.0, API v2.0, Component v1.6.2) MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.6.2) MCA allocator: basic (MCA v2.0, API v2.0, Component v1.6.2) MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.6.2) MCA coll: basic (MCA v2.0, API v2.0, Component v1.6.2) MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.6.2) MCA coll: inter (MCA v2.0, API v2.0, Component v1.6.2) MCA coll: self (MCA v2.0, API v2.0, Component v1.6.2) MCA coll: sm (MCA v2.0, API v2.0, Compone
Re: [OMPI users] Problem moving from 1.4 to 1.6
v1.6.2) MCA btl: ofud (MCA v2.0, API v2.0, Component v1.6.2) MCA btl: openib (MCA v2.0, API v2.0, Component v1.6.2) MCA btl: sm (MCA v2.0, API v2.0, Component v1.6.2) MCA btl: tcp (MCA v2.0, API v2.0, Component v1.6.2) MCA btl: udapl (MCA v2.0, API v2.0, Component v1.6.2) MCA topo: unity (MCA v2.0, API v2.0, Component v1.6.2) MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.6.2) MCA osc: rdma (MCA v2.0, API v2.0, Component v1.6.2) MCA iof: hnp (MCA v2.0, API v2.0, Component v1.6.2) MCA iof: orted (MCA v2.0, API v2.0, Component v1.6.2) MCA iof: tool (MCA v2.0, API v2.0, Component v1.6.2) MCA oob: tcp (MCA v2.0, API v2.0, Component v1.6.2) MCA odls: default (MCA v2.0, API v2.0, Component v1.6.2) MCA ras: cm (MCA v2.0, API v2.0, Component v1.6.2) MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.6.2) MCA ras: loadleveler (MCA v2.0, API v2.0, Component v1.6.2) MCA ras: slurm (MCA v2.0, API v2.0, Component v1.6.2) MCA rmaps: load_balance (MCA v2.0, API v2.0, Component v1.6.2) MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.6.2) MCA rmaps: resilient (MCA v2.0, API v2.0, Component v1.6.2) MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.6.2) MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.6.2) MCA rmaps: topo (MCA v2.0, API v2.0, Component v1.6.2) MCA rml: oob (MCA v2.0, API v2.0, Component v1.6.2) MCA routed: binomial (MCA v2.0, API v2.0, Component v1.6.2) MCA routed: cm (MCA v2.0, API v2.0, Component v1.6.2) MCA routed: direct (MCA v2.0, API v2.0, Component v1.6.2) MCA routed: linear (MCA v2.0, API v2.0, Component v1.6.2) MCA routed: radix (MCA v2.0, API v2.0, Component v1.6.2) MCA routed: slave (MCA v2.0, API v2.0, Component v1.6.2) MCA plm: rsh (MCA v2.0, API v2.0, Component v1.6.2) MCA plm: slurm (MCA v2.0, API v2.0, Component v1.6.2) MCA filem: rsh (MCA v2.0, API v2.0, Component v1.6.2) MCA errmgr: default (MCA v2.0, API v2.0, Component v1.6.2) MCA ess: env (MCA v2.0, API v2.0, Component v1.6.2) MCA ess: hnp (MCA v2.0, API v2.0, Component v1.6.2) MCA ess: singleton (MCA v2.0, API v2.0, Component v1.6.2) MCA ess: slave (MCA v2.0, API v2.0, Component v1.6.2) MCA ess: slurm (MCA v2.0, API v2.0, Component v1.6.2) MCA ess: slurmd (MCA v2.0, API v2.0, Component v1.6.2) MCA ess: tool (MCA v2.0, API v2.0, Component v1.6.2) MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.6.2) MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.6.2) MCA notifier: command (MCA v2.0, API v1.0, Component v1.6.2) MCA notifier: syslog (MCA v2.0, API v1.0, Component v1.6.2) Jeffrey A. Cummings Engineering Specialist Performance Modeling and Analysis Department Systems Analysis and Simulation Subdivision Systems Engineering Division Engineering and Technology Group The Aerospace Corporation 571-307-4220 jeffrey.a.cummi...@aero.org From: Ralph Castain <r...@open-mpi.org> To: Open MPI Users <us...@open-mpi.org>, List-Post: users@lists.open-mpi.org Date: 06/27/2014 03:42 PM Subject:Re: [OMPI users] Problem moving from 1.4 to 1.6 Sent by:"users" <users-boun...@open-mpi.org> Let me steer you on a different course. Can you run "ompi_info" and paste the output here? It looks to me like someone installed a version that includes uDAPL support, so you may have to disable some additional things to get it to run. On Jun 27, 2014, at 9:53 AM, Jeffrey A Cummings < jeffrey.a.cummi...@aero.org> wrote: We have recently upgraded our cluster to a version of Linux which comes with openMPI version 1.6.2. An application which ran previously (using some version of 1.4) now errors out with the following messages: librdmacm: Fatal: no RDMA devices found librdmacm: Fatal: no RDMA devices found librdmacm: Fatal: no RDMA devices found -- WARNING: Failed to open "OpenIB-cma" [DAT_INTERNAL_ERROR:]. This may be a real error or it may be an invalid entry in the uDAPL Registry which is contained in the dat.conf file. Contact your local System Administrator to confirm the availability of the interfaces in the dat.conf file. --
Re: [OMPI users] Problem moving from 1.4 to 1.6
Once again, you guys are assuming (incorrectly) that all your users are working in an environment where everyone is free (based on corporate IT policies) to do things like that. As an aside, you're also assuming that all your users are Unix/Linux experts. I've been following this list for several years and couldn't even begin to count the number of questions from the non-experts who are struggling with something which is trivial for you but not for them. Jeffrey A. Cummings Engineering Specialist Performance Modeling and Analysis Department Systems Analysis and Simulation Subdivision Systems Engineering Division Engineering and Technology Group The Aerospace Corporation 571-307-4220 jeffrey.a.cummi...@aero.org From: Reuti <re...@staff.uni-marburg.de> To: Open MPI Users <us...@open-mpi.org>, List-Post: users@lists.open-mpi.org Date: 06/27/2014 02:03 PM Subject:Re: [OMPI users] Problem moving from 1.4 to 1.6 Sent by:"users" <users-boun...@open-mpi.org> Hi, Am 27.06.2014 um 19:56 schrieb Jeffrey A Cummings: > I appreciate your response and I understand the logic behind your suggestion, but you and the other regular expert contributors to this list are frequently working under a misapprehension. Many of your openMPI users don't have any control over what version of openMPI is available on their system. I'm stuck with whatever version my IT people choose to bless, which in general is the (possibly old and/or moldy) version that is bundled with some larger package (i.e., Rocks, Linux). The fact that I'm only now seeing this 1.4 to 1.6 problem illustrates the situation I'm in. I really need someone to did into their memory archives to see if they can come up with a clue for me. You can freely download the Open MPI source and install it for example in your personal ~/local/openmpi-1.8 or alike. Pointing your $PATH and $LD_LIBRARY_PATH to your own version will supersede installed system one. -- Reuti > Jeffrey A. Cummings > Engineering Specialist > Performance Modeling and Analysis Department > Systems Analysis and Simulation Subdivision > Systems Engineering Division > Engineering and Technology Group > The Aerospace Corporation > 571-307-4220 > jeffrey.a.cummi...@aero.org > > > > From:Gus Correa <g...@ldeo.columbia.edu> > To:Open MPI Users <us...@open-mpi.org>, > Date:06/27/2014 01:45 PM > Subject:Re: [OMPI users] Problem moving from 1.4 to 1.6 > Sent by:"users" <users-boun...@open-mpi.org> > > > > It may be easier to install the latest OMPI from the tarball, > rather than trying to sort out the error. > > http://www.open-mpi.org/software/ompi/v1.8/ > > The packaged built of (somewhat old) OMPI 1.6.2 that came with > Linux may not have built against the same IB libraries, hardware, > and configuration you have. > [The error message reference to udapl is ominous.] > > > The mpirun command line contains the argument '--mca btl ^openib', which > > I thought told mpi to not look for the ib interface. > > As you said, the mca parameter above tells OMPI not to use openib, > although it may not be the only cause of the problem. > If you want to use openib switch to > --mca btl openib,sm,self > > Another thing to check is whether there is a mixup of enviroment > variables, PATH and LD_LIBRARY_PATH perhaps pointing to the old OMPI > version you may have installed. > > My two cents, > Gus Correa > > On 06/27/2014 12:53 PM, Jeffrey A Cummings wrote: > > We have recently upgraded our cluster to a version of Linux which comes > > with openMPI version 1.6.2. > > > > An application which ran previously (using some version of 1.4) now > > errors out with the following messages: > > > > librdmacm: Fatal: no RDMA devices found > > librdmacm: Fatal: no RDMA devices found > > librdmacm: Fatal: no RDMA devices found > > > > -- > > WARNING: Failed to open "OpenIB-cma" [DAT_INTERNAL_ERROR:]. > > This may be a real error or it may be an invalid entry in the > > uDAPL > > Registry which is contained in the dat.conf file. Contact your > > local > > System Administrator to confirm the availability of the > > interfaces in > > the dat.conf file. > > > > -- > > [tupile:25363] 2 more processes have sent help message > > help-mpi-btl-udapl.txt / dat_ia_open fail > > [tupile:25363] Set MCA parameter "orte_base_help_aggregate
Re: [OMPI users] Problem moving from 1.4 to 1.6
I appreciate your response and I understand the logic behind your suggestion, but you and the other regular expert contributors to this list are frequently working under a misapprehension. Many of your openMPI users don't have any control over what version of openMPI is available on their system. I'm stuck with whatever version my IT people choose to bless, which in general is the (possibly old and/or moldy) version that is bundled with some larger package (i.e., Rocks, Linux). The fact that I'm only now seeing this 1.4 to 1.6 problem illustrates the situation I'm in. I really need someone to did into their memory archives to see if they can come up with a clue for me. Jeffrey A. Cummings Engineering Specialist Performance Modeling and Analysis Department Systems Analysis and Simulation Subdivision Systems Engineering Division Engineering and Technology Group The Aerospace Corporation 571-307-4220 jeffrey.a.cummi...@aero.org From: Gus Correa <g...@ldeo.columbia.edu> To: Open MPI Users <us...@open-mpi.org>, List-Post: users@lists.open-mpi.org Date: 06/27/2014 01:45 PM Subject:Re: [OMPI users] Problem moving from 1.4 to 1.6 Sent by:"users" <users-boun...@open-mpi.org> It may be easier to install the latest OMPI from the tarball, rather than trying to sort out the error. http://www.open-mpi.org/software/ompi/v1.8/ The packaged built of (somewhat old) OMPI 1.6.2 that came with Linux may not have built against the same IB libraries, hardware, and configuration you have. [The error message reference to udapl is ominous.] > The mpirun command line contains the argument '--mca btl ^openib', which > I thought told mpi to not look for the ib interface. As you said, the mca parameter above tells OMPI not to use openib, although it may not be the only cause of the problem. If you want to use openib switch to --mca btl openib,sm,self Another thing to check is whether there is a mixup of enviroment variables, PATH and LD_LIBRARY_PATH perhaps pointing to the old OMPI version you may have installed. My two cents, Gus Correa On 06/27/2014 12:53 PM, Jeffrey A Cummings wrote: > We have recently upgraded our cluster to a version of Linux which comes > with openMPI version 1.6.2. > > An application which ran previously (using some version of 1.4) now > errors out with the following messages: > > librdmacm: Fatal: no RDMA devices found > librdmacm: Fatal: no RDMA devices found > librdmacm: Fatal: no RDMA devices found > > -- > WARNING: Failed to open "OpenIB-cma" [DAT_INTERNAL_ERROR:]. > This may be a real error or it may be an invalid entry in the > uDAPL > Registry which is contained in the dat.conf file. Contact your > local > System Administrator to confirm the availability of the > interfaces in > the dat.conf file. > > -- > [tupile:25363] 2 more processes have sent help message > help-mpi-btl-udapl.txt / dat_ia_open fail > [tupile:25363] Set MCA parameter "orte_base_help_aggregate" to > 0 to see all help / error messages > > The mpirun command line contains the argument '--mca btl ^openib', which > I thought told mpi to not look for the ib interface. > > Can anyone suggest what the problem might be? Did the relevant syntax > change between versions 1.4 and 1.6? > > > Jeffrey A. Cummings > Engineering Specialist > Performance Modeling and Analysis Department > Systems Analysis and Simulation Subdivision > Systems Engineering Division > Engineering and Technology Group > The Aerospace Corporation > 571-307-4220 > jeffrey.a.cummi...@aero.org > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: http://www.open-mpi.org/community/lists/users/2014/06/24721.php > ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/06/24722.php
[OMPI users] Problem moving from 1.4 to 1.6
We have recently upgraded our cluster to a version of Linux which comes with openMPI version 1.6.2. An application which ran previously (using some version of 1.4) now errors out with the following messages: librdmacm: Fatal: no RDMA devices found librdmacm: Fatal: no RDMA devices found librdmacm: Fatal: no RDMA devices found -- WARNING: Failed to open "OpenIB-cma" [DAT_INTERNAL_ERROR:]. This may be a real error or it may be an invalid entry in the uDAPL Registry which is contained in the dat.conf file. Contact your local System Administrator to confirm the availability of the interfaces in the dat.conf file. -- [tupile:25363] 2 more processes have sent help message help-mpi-btl-udapl.txt / dat_ia_open fail [tupile:25363] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages The mpirun command line contains the argument '--mca btl ^openib', which I thought told mpi to not look for the ib interface. Can anyone suggest what the problem might be? Did the relevant syntax change between versions 1.4 and 1.6? Jeffrey A. Cummings Engineering Specialist Performance Modeling and Analysis Department Systems Analysis and Simulation Subdivision Systems Engineering Division Engineering and Technology Group The Aerospace Corporation 571-307-4220 jeffrey.a.cummi...@aero.org
Re: [OMPI users] Deadlocks and warnings from libevent when using MPI_THREAD_MULTIPLE
Wouldn't you save yourself work and your users confusion if you disabled options that don't currently work? Jeffrey A. Cummings Engineering Specialist Performance Modeling and Analysis Department Systems Analysis and Simulation Subdivision Systems Engineering Division Engineering and Technology Group The Aerospace Corporation 571-307-4220 jeffrey.a.cummi...@aero.org From: Ralph Castain <r...@open-mpi.org> To: Open MPI Users <us...@open-mpi.org>, List-Post: users@lists.open-mpi.org Date: 04/25/2014 05:40 PM Subject:Re: [OMPI users] Deadlocks and warnings from libevent when using MPI_THREAD_MULTIPLE Sent by:"users" <users-boun...@open-mpi.org> We don't fully support THREAD_MULTIPLE, and most definitely not when using IB. We are planning on extending that coverage in the 1.9 series On Apr 25, 2014, at 2:22 PM, Markus Wittmann <markus.wittm...@fau.de> wrote: > Hi everyone, > > I'm using the current Open MPI 1.8.1 release and observe > non-deterministic deadlocks and warnings from libevent when using > MPI_THREAD_MULTIPLE. Open MPI has been configured with > --enable-mpi-thread-multiple --with-tm --with-verbs (see attached > config.log) > > Attached is a sample application that spawns a thread for each process > after MPI_Init_thread has been called. The thread then calls MPI_Recv > which blocks until the matching MPI_Send is called just before > MPI_Finalize is called in the main thread. (AFAIK MPICH uses such kind > of facility to implement a progress thread) Meanwhile the main thread > exchanges data with its right/left neighbor via ISend/IRecv. > > I only see this, when the MPI processes run on separate nodes like in > the following: > > $ mpiexec -n 2 -map-by node ./test > [0] isend/irecv. > [0] progress thread... > [0] waitall. > [warn] opal_libevent2021_event_base_loop: reentrant invocation. Only one event_base_loop can run on each event_base at once. > [1] isend/irecv. > [1] progress thread... > [1] waitall. > [warn] opal_libevent2021_event_base_loop: reentrant invocation. Only one event_base_loop can run on each event_base at once. > > > > Can anybody confirm this? > > Best regards, > Markus > > -- > Markus Wittmann, HPC Services > Friedrich-Alexander-Universität Erlangen-Nürnberg > Regionales Rechenzentrum Erlangen (RRZE) > Martensstrasse 1, 91058 Erlangen, Germany > http://www.rrze.fau.de/hpc/ > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Windows support for OpenMPI
I would be disappointed to see the Windows support go away. I use it mostly for debugging, but it's valuable to me for that purpose. - Jeff Cummings From: Durga ChoudhuryTo: Open MPI Users List-Post: users@lists.open-mpi.org Date: 12/07/2012 11:33 AM Subject:Re: [OMPI users] Windows support for OpenMPI Sent by:users-boun...@open-mpi.org All Let me reiterate my (minimal, compared to other developers) support to the OpenMPI project. If all it takes is to build and run regression tests on a platform to add a feather in the cap, I am willing to do it. The low interest in Windows platform does not surprise me; most HPC infrastructures use a Unix-like setup, and those few who do use Windows likely use Microsoft's own HPC server and MPI stack rather than OpenMPI. That said, I believe that we should continue supporting Windows for this one reason, if nothing else: since Windows comes preinstalled on all PCs, new entrants to the field of computer programming are starting on a Windows based machine. By providing Windows support for OpenMPI, we will make the project accessible to the younger generation and ensure they adopt it when they enter the work force. That is another reason that makes me think that just because few people asked for it explicitly, few people are actually using it, as the newbie types usually do not make explicit requests. Thanks Durga On Fri, Dec 7, 2012 at 10:28 AM, Jeff Squyres wrote: Sorry for my late reply; I've been in the MPI Forum and Open MPI engineering meetings all week. Some points: 1. Yes, it would be a shame to lose all the Windows support that Shiqing did. 2. Microsoft has told me that they're of the mindset "the more, the merrier" for their platform (i.e., they'd love to have more than one MPI on Windows, but probably can't help develop/support Open MPI on windows). Makes perfect sense to me. 3. I see that we have 2 volunteers to keep the build support going for the v1.6 series, and another volunteer to do continued development for v1.7 and beyond. But all of these would need good reasons to go forward (active Open MPI Windows users, financial support, etc.). It doesn't look like there is much support. 4. I'm bummed to hear that Windows building is broken in 1.6.x. $%#$%#@!! If anyone wants to take a gander at fixing it, I'd love to see your patches, for nothing other than just maintaining Windows support for the remainder of the 1.6.x series. But per #3, it may not be worth it. 5. Based on this feedback, it seems like we should remove the Windows support from the OMPI SVN trunk and all future versions. It can always be resurrected from SVN history if someone wants to pick up this effort again in the future. On Dec 6, 2012, at 11:07 AM, Damien wrote: > So far, I count three people interested in OpenMPI on Windows. That's not a case for ongoing support. > > Damien > > On 04/12/2012 11:32 AM, Durga Choudhury wrote: >> All >> >> Since I did not see any Microsoft/other 'official' folks pick up the ball, let me step up. I have been lurking in this list for quite a while and I am a generic scientific programmer (i.e. I use many frameworks such as OpenCL/OpenMP etc, not just MPI) >> Although I am primarily a Linux user, I do own multiple versions of Visual Studio licenses and have a small cluster that dual boots to Windows/Linux (and more nodes can be added on demand). I cannot do any large scale testing on this, but I can build and run regression tests etc. >> >> If the community needs the Windows support to continue, I can take up that responsibility, until a more capable person/group is found at least. >> >> Thanks >> Durga >> >> >> On Mon, Dec 3, 2012 at 12:32 PM, Damien wrote: >> All, >> >> I completely missed the message about Shiqing departing as the OpenMPI Windows maintainer. I'll try and keep Windows builds going for 1.6 at least, I have 2011 and 2013 Intel licenses and VS2008 and 2012, but not 2010. I see that the 1.6.3 code base already doesn't build on Windows in VS2012 :-(. >> >> While I can try and keep builds going, I don't have access to a Windows cluster right now, and I'm flat out on two other projects. I can test on my workstation, but that will only go so far. Longer-term, there needs to be a decision made on whether Windows gets to be a first-class citizen in OpenMPI or not. Jeff's already told me that 1.7 is lagging behind on Windows. It would be a shame to have all the work Shiqing put in gradually decay because it can't be supported enough. If there's any Microsoft/HPC/Azure folks observing this list, or any other vendors who run on Windows with OpenMPI, maybe we can see what can be done if you're interested. >> >> Damien >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >>
[OMPI users] Segmentation fault during MPI initialization
I've been having an intermittent failure during MPI initialization (v 1.4.3) for several months. It comes and goes as I make changes to my application, that is changes unrelated to MPI calls. Even when I have a version of my app which shows the problem, it doesn't happen on every submittal. This is a representative stack trace: [mtcompute-6-6:05845] *** Process received signal *** [mtcompute-6-6:05845] Signal: Segmentation fault (11) [mtcompute-6-6:05845] Signal code: Address not mapped (1) [mtcompute-6-6:05845] Failing at address: 0x2ac352e0bd80 [mtcompute-6-6:05845] [ 0] /lib64/libpthread.so.0 [0x314ee0eb10] [mtcompute-6-6:05845] [ 1] /opt/openmpi/lib/libmpi.so.0 [0x2b2b3d42fa70] [mtcompute-6-6:05845] [ 2] /opt/openmpi/lib/libopen-pal.so.0(opal_progress+0x5a) [0x2b2b3fa694ea] [mtcompute-6-6:05845] [ 3] /opt/openmpi/lib/libopen-rte.so.0 [0x2b2b3f80913c] [mtcompute-6-6:05845] [ 4] /opt/openmpi/lib/libmpi.so.0 [0x2b2b3d3f160c] [mtcompute-6-6:05845] [ 5] /opt/openmpi/lib/libmpi.so.0(MPI_Init+0xf0) [0x2b2b3d40eb00] [mtcompute-6-6:05845] [ 6] /home/cummings/DART/DARTHome/bin/linux/DebrisProp [0x418610] [mtcompute-6-6:05845] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x31df41d994] [mtcompute-6-6:05845] [ 8] /home/cummings/DART/DARTHome/bin/linux/DebrisProp [0x417992] [mtcompute-6-6:05845] *** End of error message *** -- mpirun noticed that process rank 1 with PID 5845 on node mtcompute-6-6.local exited on signal 11 (Segmentation fault). -- Any suggestions would be welcome. - Jeff Cummings
[OMPI users] Array version of MPI_Iprobe?
MPI_Iprobe returns a single status object if at least one message is waiting in a queue. I would like to be able to do something similar (i.e., non blocking probes) which would produce an array of status objects representing all messages waiting in a queue. I would then decide on the order of actual message reception based on the source field of the status array objects. Does anyone know of a way to accomplish this? Jeff Cummings
[OMPI users] Socket ports
I'm attempting to launch my app via mpirun and a host file to use nodes on multiple 'stand-alone' servers. mpirun is able to launch my app on all requested nodes on all servers, but my app doesn't seem to be able to communicate via the standard MPI api calls (send , recv, etc). The problem seems to be that my sysadmin dept has locked down most/all ports for simple socket connections. They are asking me which specific ports (or range of ports) are required by mpi. I'm assuming that mpirun used secure sockets to launch my app on all nodes but that my app is not using secure sockets via the MPI calls. Does any of this make sense? I'm using version 1.4.0 I think. - Jeff Cummings
Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.
It strikes me that you should be able to use the tag to identify the message that is to be received. In other words, you receive a message from any source but with a tag that identifies the message as containing the load value that is expected. - Jeff From: Jeff SquyresTo: Open MPI Users List-Post: users@lists.open-mpi.org Date: 07/15/2011 07:36 AM Subject:Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE. Sent by:users-boun...@open-mpi.org +1 I reiterate what I said before: >> > You will always only receive messages that were sent to *you*. >> > There's no MPI_SEND_TO_ANYONE_WHO_IS_LISTENING functionality, for >> > example. So your last statement: "But when it captures with .. >> > MPI_ANY_SOURCE and MPI_ANY_TAG, the receiver will capture any message >> > (even not targetted for it)" is incorrect. When process A calls MPI_SEND to send to process B, then the message is sent only to B. Not to any other process. Regardless of whether you use ANY_SOURCE, ANY_TAG, both, or neither. On Jul 15, 2011, at 7:04 AM, Terry Dontje wrote: > Well MPI_Recv does give you the message that was sent specifically to the rank calling it by any of the processes in the communicator. If you think the message you received should have gone to another rank then there is a bug somewhere. I would start by either adding debugging printf's to your code to trace the messages. Or narrowing down the code to a small kernel such that you can prove to yourself that MPI is working the way it should and if not you can show us where it is going wrong. > > --td > > On 7/15/2011 6:51 AM, Mudassar Majeed wrote: >> I get the sender's rank in status.MPI_SOURCE, but it is different than expected. I need to receive that message which was sent to me, not any message. >> >> regards, >> >> Date: Fri, 15 Jul 2011 06:33:41 -0400 >> From: Terry Dontje >> Subject: Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE. >> To: us...@open-mpi.org >> Message-ID: <4e201785.6010...@oracle.com> >> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" >> >> Mudassar, >> >> You can do what you are asking. The receiver uses MPI_ANY_SOURCE for >> the source rank value and when you receive a message the >> status.MPI_SOURCE will contain the rank of the actual sender not the >> receiver's rank. If you are not seeing that then there is a bug somewhere. >> >> --td >> >> On 7/14/2011 9:54 PM, Mudassar Majeed wrote: >> > Friend, >> > I can not specify the rank of the sender. Because only >> > the sender knows to which receiver the message is to be sent. The >> > receiver does not know from which sender the message will come. I am >> > trying to do a research work on load balancing in MPI application >> > where load is redistributed, so in that I require a receiver to >> > receive a load value from a sender that it does not know. On the other >> > hand, the sender actually calculates, to which receiver this load >> > value should be sent. So for this, I want sender to send a message >> > containing the load to a receiver, but receiver does not know from >> > which sender the message will come. See, it is like send receiver in >> > DATAGRAM sockets. The receiver, receives the message on the IP and >> > port, the message which was directed for it. I want to have same >> > behavior. But it seems that it is not possible in MPI. Isn't it? >> > >> > regards, >> > Mudassar >> > >> > >> > *From:* Jeff Squyres >> > *To:* Mudassar Majeed >> > *Cc:* Open MPI Users >> > *Sent:* Friday, July 15, 2011 3:30 AM >> > *Subject:* Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE. >> > >> > Right. I thought you were asking about receiving *another* message >> > from whomever you just received from via ANY_SOURCE. >> > >> > If you want to receive from a specific sender, you just specify the >> > rank you want to receive from -- not ANY_SOURCE. >> > >> > You will always only receive messages that were sent to *you*. >> > There's no MPI_SEND_TO_ANYONE_WHO_IS_LISTENING functionality, for >> > example. So your last statement: "But when it captures with .. >> > MPI_ANY_SOURCE and MPI_ANY_TAG, the receiver will capture any message >> > (even not targetted for it)" is incorrect. >> > >> > I guess I still don't understand your question...? >> > >> > >> > On Jul 14, 2011, at 9:17 PM, Mudassar Majeed wrote: >> > >> > > >> > > I know this, but when I compare status.MPI_SOURCE with myid, they >> > are different. I guess you need to reconsider my question. The >> > MPI_Recv function seems to capture message from the queue with some >> > search parameters like source, tag etc. So in case the receiver does >> > not know the sender and wants to receive only
Re: [OMPI users] [WARNING: SPOOFED E-MAIL--Non-Aerospace Sender] Re: Problem with prebuilt ver 1.5.3 for windows
I've been following this list for several months now and have been quite impressed by the helpfulness of the list experts in response to most questions. So how come the pregnant silence in response to my question? I could really use some help here. - Jeff From: Jeffrey A Cummings <jeffrey.a.cummi...@aero.org> To: Open MPI Users <us...@open-mpi.org> List-Post: users@lists.open-mpi.org Date: 06/29/2011 04:18 PM Subject:[WARNING: SPOOFED E-MAIL--Non-Aerospace Sender] Re: [OMPI users] Problem with prebuilt ver 1.5.3 for windows Sent by:users-boun...@open-mpi.org Anyone (Shiqing perhaps) have any more thoughts on this problem? - Jeff From:Damien <dam...@khubla.com> To:Open MPI Users <us...@open-mpi.org> List-Post: users@lists.open-mpi.org Date:06/24/2011 03:59 PM Subject:Re: [OMPI users] Problem with prebuilt ver 1.5.3 for windows Sent by:users-boun...@open-mpi.org Yeah, and I'm wrong too, InterlockedCompareExchange64 is available on 32-bit. I think this is one for Shiqing. You could build OpenMPI yourself if you have VS2008. It's pretty easy to do. Damien On 24/06/2011 1:51 PM, Jeffrey A Cummings wrote: Damien - I'm using the 32 bit version of OpenMPI. I think the 64 refers to the size of integer that the function works on, not the operating system version. I didn't have this problem with VS 2008, so I think they've changed something in VS 2010. I just don't know how to fix it. - Jeff From:Damien <dam...@khubla.com> To:Open MPI Users <us...@open-mpi.org> List-Post: users@lists.open-mpi.org Date:06/24/2011 02:35 PM Subject:Re: [OMPI users] Problem with prebuilt ver 1.5.3 for windows Sent by:users-boun...@open-mpi.org Jeff, InterlockedCompareExchange64 is a 64-bit-only instruction. Are you running XP 32-bit (I think you are b/c I don't think there was a XP64 SP3...). You need the 32-bit OpenMPI version. If you are running a 64-bit OS, but building a 32-bit executable, that instruction isn't available in 32-bit and you still need to link with 32-bit OpenMPI. Damien On 24/06/2011 12:16 PM, Jeffrey A Cummings wrote: I'm having a problem using the prebuilt Windows version 1.5.3 with my app built with MS VisualStudio 2010. I get an error message (for each node) that says: "The procedure entry point InterlockedCompareExchange64 could not be located in the dynamic link library KERNEL32.dll". I'm running Windows XP, sp 3. - Jeff Cummings ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Problem with prebuilt ver 1.5.3 for windows
I'm having a problem using the prebuilt Windows version 1.5.3 with my app built with MS VisualStudio 2010. I get an error message (for each node) that says: "The procedure entry point InterlockedCompareExchange64 could not be located in the dynamic link library KERNEL32.dll". I'm running Windows XP, sp 3. - Jeff Cummings
Re: [OMPI users] How closely tied is a specific release of OpenMPI to the host operating system and other system software?
Thanks for all the good replies on this thread. I don't know if I'll be able to make a dent in the corporate IT bureaucracy but I'm going to try. From: Prentice Bisbal <prent...@ias.edu> To: Open MPI Users <us...@open-mpi.org> List-Post: users@lists.open-mpi.org Date: 02/02/2011 11:35 AM Subject:Re: [OMPI users] How closely tied is a specific release of OpenMPI to the host operating system and other system software? Sent by:users-boun...@open-mpi.org Jeffrey A Cummings wrote: > I use OpenMPI on a variety of platforms: stand-alone servers running > Solaris on sparc boxes and Linux (mostly CentOS) on AMD/Intel boxes, > also Linux (again CentOS) on large clusters of AMD/Intel boxes. These > platforms all have some version of the 1.3 OpenMPI stream. I recently > requested an upgrade on all systems to 1.4.3 (for production work) and > 1.5.1 (for experimentation). I'm getting a lot of push back from the > SysAdmin folks claiming that OpenMPI is closely intertwined with the > specific version of the operating system and/or other system software > (i.e., Rocks on the clusters). I need to know if they are telling me > the truth or if they're just making excuses to avoid the work. To state > my question another way: Apparently each release of Linux and/or Rocks > comes with some version of OpenMPI bundled in. Is it dangerous in some > way to upgrade to a newer version of OpenMPI? Thanks in advance for any > insight anyone can provide. > > - Jeff > Jeff, OpenMPI is more or less a user-space program, and isn't that tightly coupled to the OS at all. As long as the OS has the correct network drivers (ethernet, IB, or other), that's all OpenMPI needs to do it's job. In fact, you can install it yourself in your own home directory (if your home directory is shared amongst the cluster nodes you want to use), and run it from there - no special privileges needed. I have many different versions of OpenMPI installed on my systems, without a problem. As a system administrator responsible for maintaining OpenMPI on several clusters, it sounds like one of two things: 1. Your system administrators really don't know what they're talking about, or, 2. They're lying to you to avoid doing work. -- Prentice ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] OpenMPI version syntax?
The context was wrt the OpenMPI version that is bundled with a specific version of CentOS Linux which my IT folks are about to install on one of our servers. Since the most recent 1.4 stream version is 1.4.3, I'm afraid that 1.4-4 is really some variant of 1.4 (i.e., 1.4.0) and hence not that new. From: Jeff Squyres <jsquy...@cisco.com> To: Open MPI Users <us...@open-mpi.org> List-Post: users@lists.open-mpi.org Date: 02/02/2011 07:38 PM Subject:Re: [OMPI users] OpenMPI version syntax? Sent by:users-boun...@open-mpi.org On Feb 2, 2011, at 1:44 PM, Jeffrey A Cummings wrote: > I've encountered a supposed OpenMPI version of 1.4-4. Is the hyphen a typo or is this syntax correct and if so what does it mean? Is this an RPM version number? It's fairly common for RPMs to add "-X" at the end of the version number. The "X" indicates the RPM version number (i.e., the version number of the packaging -- not the package itself). Open MPI's version number scheme is explained here: http://www.open-mpi.org/software/ompi/versions/ -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] OpenMPI version syntax?
I've encountered a supposed OpenMPI version of 1.4-4. Is the hyphen a typo or is this syntax correct and if so what does it mean? - Jeff
[OMPI users] How closely tied is a specific release of OpenMPI to the host operating system and other system software?
I use OpenMPI on a variety of platforms: stand-alone servers running Solaris on sparc boxes and Linux (mostly CentOS) on AMD/Intel boxes, also Linux (again CentOS) on large clusters of AMD/Intel boxes. These platforms all have some version of the 1.3 OpenMPI stream. I recently requested an upgrade on all systems to 1.4.3 (for production work) and 1.5.1 (for experimentation). I'm getting a lot of push back from the SysAdmin folks claiming that OpenMPI is closely intertwined with the specific version of the operating system and/or other system software (i.e., Rocks on the clusters). I need to know if they are telling me the truth or if they're just making excuses to avoid the work. To state my question another way: Apparently each release of Linux and/or Rocks comes with some version of OpenMPI bundled in. Is it dangerous in some way to upgrade to a newer version of OpenMPI? Thanks in advance for any insight anyone can provide. - Jeff