Re: [OMPI users] [EXTERNAL] strange pml error
this seemed to help me as well, so far at least. still have a lot more testing to do On Tue, Nov 2, 2021 at 4:15 PM Shrader, David Lee wrote: > > As a workaround for now, I have found that setting OMPI_MCA_pml=ucx seems to > get around this issue. I'm not sure why this works, but perhaps there is > different initialization that happens such that the offending device search > problem doesn't occur? > > > Thanks, > > David > > > > > From: Shrader, David Lee > Sent: Tuesday, November 2, 2021 2:09 PM > To: Open MPI Users > Cc: Michael Di Domenico > Subject: Re: [EXTERNAL] [OMPI users] strange pml error > > > I too have been getting this using 4.1.1, but not with the master nightly > tarballs from mid-October. I still have it on my to-do list to open a github > issue. The problem seems to come from device detection in the ucx pml: on > some ranks, it fails to find a device and thus the ucx pml disqualifies > itself. Which then just leaves the ob1 pml. > > > Thanks, > > David > > > > > From: users on behalf of Michael Di > Domenico via users > Sent: Tuesday, November 2, 2021 1:35 PM > To: Open MPI Users > Cc: Michael Di Domenico > Subject: [EXTERNAL] [OMPI users] strange pml error > > fairly frequently, but not everytime when trying to run xhpl on a new > machine i'm bumping into this. it happens with a single node or > multiple nodes > > node1 selected pml ob1, but peer on node1 selected pml ucx > > if i rerun the exact same command a few minutes later, it works fine. > the machine is new and i'm the only one using it so there are no user > conflicts > > the software stack is > > slurm 21.8.2.1 > ompi 4.1.1 > pmix 3.2.3 > ucx 1.9.0 > > the hardware is HPE w/ mellanox edr cards (but i doubt that matters) > > any thoughts?
[OMPI users] strange pml error
fairly frequently, but not everytime when trying to run xhpl on a new machine i'm bumping into this. it happens with a single node or multiple nodes node1 selected pml ob1, but peer on node1 selected pml ucx if i rerun the exact same command a few minutes later, it works fine. the machine is new and i'm the only one using it so there are no user conflicts the software stack is slurm 21.8.2.1 ompi 4.1.1 pmix 3.2.3 ucx 1.9.0 the hardware is HPE w/ mellanox edr cards (but i doubt that matters) any thoughts?
Re: [OMPI users] [EXTERNAL] building openshem on opa
On Mon, Mar 22, 2021 at 11:13 AM Pritchard Jr., Howard wrote: > https://github.com/Sandia-OpenSHMEM/SOS > if you want to use OpenSHMEM over OPA. > If you have lots of cycles for development work, you could write an OFI SPML > for the OSHMEM component of Open MPI. thanks, i am aware of the sandia version. the devs in my organization don't really use shmem, but there was a call for it recently. i hadn't even noticed shmem didn't build on our opa cluster. for now we have a smaller mellanox cluster they can build against. my ability to code an spml is nil. but if we had more interest from the internal devs i'd certainly be willing to fund someone to do it.. :)
[OMPI users] building openshem on opa
i can build and run openmpi on an opa network just fine, but it turns out building openshmem fails. the message is (no spml) found looking at the config log it looks like it tries to build spml ikrit and ucx which fail. i turn ucx off because it doesn't support opa and isn't needed. so this message is really just a confirmation that openshmem and opa are not capable of being built or did i do something wrong and a curiosity if anyone knows what kind of effort would be involved in getting it to work
Re: [OMPI users] [EXTERNAL] Re: OpenMPI 4.0.5 error with Omni-path
if you have OPA cards, for openmpi you only need --with-ofi, you don't need psm/psm2/verbs/ucx. but this assumes you're running a rhel based distro and have installed the OPA fabric suite of software from Intel/CornelisNetworks. which is what i have. perhaps there's something really odd in debian or there's an incompatibility with the older ofed drivers perhaps included with debian. unfortunately i don't have access to a debian, so i can't be much more help if i had to guess totally pulling junk from the air, there's probably something incompatible with PSM and OPA when running specifically on debian (likely due to library versioning). i don't know how common that is, so it's not clear how flushed out and tested it is On Wed, Jan 27, 2021 at 3:07 PM Patrick Begou via users wrote: > > Hi Howard and Michael > > first many thanks for testing with my short application. Yes, when the > test code runs fine it just show the max RSS size of rank 0 process. > When it runs wrong it put a messages about each invalid value found. > > As I said, I have also deployed OpenMPI on various cluster (in DELL data > center at Austin) when I was testing some architectures some months ago > and nor on AMD/Mellanox_IB nor on Intel/Omni-path I got any problem. The > goal was running my tests with same software stacks and be sure to be > able to deploy my software stack on the selected solution. > But as your clusters (and my small local clusters) they were all running > RedHat (or similar Linux flavors) and a modern Gnu compiler (9 or 10). > The university's cluster I have access is running Debian stretch and > provides GCC6 as default compiler. > > I cannot ask for a different OS, but I can deploy a local gcc10 and > build again OpenMPI. UCX is not available on this cluster, should I > deploy a local UCX too ? > > Libpsm2 seams good: > dahu103 : dpkg -l |grep psm > ii libfabric-psm 1.10.0-2-1ifs+deb9amd64 Dynamic PSM > provider for user-space Open Fabric Interfaces > ii libfabric-psm2 1.10.0-2-1ifs+deb9amd64 Dynamic PSM2 > provider for user-space Open Fabric Interfaces > ii libpsm-infinipath1 3.3-19-g67c0807-2ifs+deb9 amd64 PSM Messaging > library for Intel Truescale adapters > ii libpsm-infinipath1-dev 3.3-19-g67c0807-2ifs+deb9 amd64 Development > files for libpsm-infinipath1 > ii libpsm2-2 11.2.185-1-1ifs+deb9 amd64 Intel PSM2 > Libraries > ii libpsm2-2-compat 11.2.185-1-1ifs+deb9 amd64 Compat > library for Intel PSM2 > ii libpsm2-dev11.2.185-1-1ifs+deb9 amd64 Development > files for Intel PSM2 > ii psmisc 22.21-2.1+b2 amd64 utilities > that use the proc file system > > This will be my next try to install OpenMPI on this cluster. > > Patrick > > > Le 27/01/2021 à 18:09, Pritchard Jr., Howard via users a écrit : > > Hi Folks, > > > > I'm also have problems reproducing this on one of our OPA clusters: > > > > libpsm2-11.2.78-1.el7.x86_64 > > libpsm2-devel-11.2.78-1.el7.x86_64 > > > > cluster runs RHEL 7.8 > > > > hca_id: hfi1_0 > > transport: InfiniBand (0) > > fw_ver: 1.27.0 > > node_guid: 0011:7501:0179:e2d7 > > sys_image_guid: 0011:7501:0179:e2d7 > > vendor_id: 0x1175 > > vendor_part_id: 9456 > > hw_ver: 0x11 > > board_id: Intel Omni-Path Host Fabric Interface > > Adapter 100 Series > > phys_port_cnt: 1 > > port: 1 > > state: PORT_ACTIVE (4) > > max_mtu:4096 (5) > > active_mtu: 4096 (5) > > sm_lid: 1 > > port_lid: 99 > > port_lmc: 0x00 > > link_layer: InfiniBand > > > > using gcc/gfortran 9.3.0 > > > > Built Open MPI 4.0.5 without any special configure options. > > > > Howard > > > > On 1/27/21, 9:47 AM, "users on behalf of Michael Di Domenico via users" > > > > wrote: > > > > for whatever it's worth running the test program on my OPA cluster > > seems to work. well it keeps spitting out [INFO MEMORY] lines, not > > sure if it's supposed to stop at some point > > > > i'm running rhel7, gcc 10.1, openmpi 4.0.5rc2, with-ofi, > > without-{psm,ucx,verbs} >
Re: [OMPI users] OpenMPI 4.0.5 error with Omni-path
for whatever it's worth running the test program on my OPA cluster seems to work. well it keeps spitting out [INFO MEMORY] lines, not sure if it's supposed to stop at some point i'm running rhel7, gcc 10.1, openmpi 4.0.5rc2, with-ofi, without-{psm,ucx,verbs} On Tue, Jan 26, 2021 at 3:44 PM Patrick Begou via users wrote: > > Hi Michael > > indeed I'm a little bit lost with all these parameters in OpenMPI, mainly > because for years it works just fine out of the box in all my deployments on > various architectures, interconnects and linux flavor. Some weeks ago I > deploy OpenMPI4.0.5 in Centos8 with gcc10, slurm and UCX on an AMD epyc2 > cluster with connectX6, and it just works fine. It is the first time I've > such trouble to deploy this library. > > If you have my mail posted the 25/01/2021 in this discussion at 18h54 (may > be Paris TZ) there is a small test case attached that show the problem. Did > you got it or did the list strip these attachments ? I can provide it again. > > Many thanks > > Patrick > > Le 26/01/2021 à 19:25, Heinz, Michael William a écrit : > > Patrick how are you using original PSM if you’re using Omni-Path hardware? > The original PSM was written for QLogic DDR and QDR Infiniband adapters. > > As far as needing openib - the issue is that the PSM2 MTL doesn’t support a > subset of MPI operations that we previously used the pt2pt BTL for. For > recent version of OMPI, the preferred BTL to use with PSM2 is OFI. > > Is there any chance you can give us a sample MPI app that reproduces the > problem? I can’t think of another way I can give you more help without being > able to see what’s going on. It’s always possible there’s a bug in the PSM2 > MTL but it would be surprising at this point. > > Sent from my iPad > > On Jan 26, 2021, at 1:13 PM, Patrick Begou via users > wrote: > > > Hi all, > > I ran many tests today. I saw that an older 4.0.2 version of OpenMPI packaged > with Nix was running using openib. So I add the --with-verbs option to setup > this module. > > That I can see now is that: > > mpirun -hostfile $OAR_NODEFILE --mca mtl psm -mca btl_openib_allow_ib true > > > - the testcase test_layout_array is running without error > > - the bandwidth measured with osu_bw is half of thar it should be: > > # OSU MPI Bandwidth Test v5.7 > # Size Bandwidth (MB/s) > 1 0.54 > 2 1.13 > 4 2.26 > 8 4.51 > 16 9.06 > 32 17.93 > 64 33.87 > 12869.29 > 256 161.24 > 512 333.82 > 1024 682.66 > 2048 1188.63 > 4096 1760.14 > 8192 2166.08 > 163842036.95 > 327683466.63 > 655366296.73 > 131072 7509.43 > 262144 9104.78 > 524288 6908.55 > 1048576 5530.37 > 2097152 4489.16 > 4194304 3498.14 > > mpirun -hostfile $OAR_NODEFILE --mca mtl psm2 -mca btl_openib_allow_ib true > ... > > - the testcase test_layout_array is not giving correct results > > - the bandwidth measured with osu_bw is the right one: > > # OSU MPI Bandwidth Test v5.7 > # Size Bandwidth (MB/s) > 1 3.73 > 2 7.96 > 4 15.82 > 8 31.22 > 16 51.52 > 32107.61 > 64196.51 > 128 438.66 > 256 817.70 > 512 1593.90 > 1024 2786.09 > 2048 4459.77 > 4096 6658.70 > 8192 8092.95 > 163848664.43 > 327688495.96 > 65536 11458.77 > 131072 12094.64 > 262144 11781.84 > 524288 12297.58 > 1048576 12346.92 > 2097152 12206.53 > 4194304 12167.00 > > But yes, I know openib is deprecated too in 4.0.5. > > Patrick > >
[OMPI users] openmpi/pmix/ucx
i haven't compiled openmpi in a while, but i'm in the process of upgrading our cluster. the last time i did this there were specific versions of mpi/pmix/ucx that were all tested and supposed to work together. my understanding of this was because pmi/ucx was under rapid development and the api's were changing is that still an issue or can i take the latest stable branches from git for each and have a relatively good shot at it all working together? the one semi-immovable i have right now is ucx which is at 1.7.0 as installed by mellanox ofed. if the above is true, is there a matrix of versions i should be using for all the others? nothing jumped out at me on the openmpi website
Re: [OMPI users] local rank to rank comms
unfortunately it takes a while to export the data, but here's what i see On Mon, Mar 11, 2019 at 11:02 PM Gilles Gouaillardet wrote: > > Michael, > > > this is odd, I will have a look. > > Can you confirm you are running on a single node ? > > > At first, you need to understand which component is used by Open MPI for > communications. > > There are several options here, and since I do not know how Open MPI was > built, nor which dependencies are installed, > > I can only list a few > > > - pml/cm uses mtl/psm2 => omnipath is used for both inter and intra node > communications > > - pml/cm uses mtl/ofi => libfabric is used for both inter and intra node > communications. it definitely uses libpsm2 for inter node > communications, and I do not know enough about the internals to tell how > inter communications are handled > > - pml/ob1 is used, I guess it uses btl/ofi for inter node communications > and btl/vader for intra node communications (in that case the NIC device > is not used for intra node communications > > there could be other I am missing (does UCX support OmniPath ? could > btl/ofi also be used for intra node communications ?) > > > mpirun --mca pml_base_verbose 10 --mca btl_base_verbose 10 --mca > mtl_base_verbose 10 ... > > should tell you what is used (feel free to compress and post the full > output if you have some hard time understanding the logs) > > > Cheers, > > > Gilles > > On 3/12/2019 1:41 AM, Michael Di Domenico wrote: > > On Mon, Mar 11, 2019 at 12:09 PM Gilles Gouaillardet > > wrote: > >> You can force > >> mpirun --mca pml ob1 ... > >> And btl/vader (shared memory) will be used for intra node communications > >> ... unless MPI tasks are from different jobs (read MPI_Comm_spawn()) > > if i run > > > > mpirun -n 16 IMB-MPI1 alltoallv > > things run fine, 12us on average for all ranks > > > > if i run > > > > mpirun -n 16 --mca pml ob1 IMB-MPI1 alltoallv > > the program runs, but then it hangs at "List of benchmarks to run: > > #Alltoallv" and no tests run > > ___ > > users mailing list > > users@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/users > > > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users ompi.run.ob1 Description: Binary data ompi.run.cm Description: Binary data ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] local rank to rank comms
On Mon, Mar 11, 2019 at 12:09 PM Gilles Gouaillardet wrote: > You can force > mpirun --mca pml ob1 ... > And btl/vader (shared memory) will be used for intra node communications ... > unless MPI tasks are from different jobs (read MPI_Comm_spawn()) if i run mpirun -n 16 IMB-MPI1 alltoallv things run fine, 12us on average for all ranks if i run mpirun -n 16 --mca pml ob1 IMB-MPI1 alltoallv the program runs, but then it hangs at "List of benchmarks to run: #Alltoallv" and no tests run ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] local rank to rank comms
On Mon, Mar 11, 2019 at 12:19 PM Ralph H Castain wrote: > OFI uses libpsm2 underneath it when omnipath detected > > > On Mar 11, 2019, at 9:06 AM, Gilles Gouaillardet > > wrote: > > It might show that pml/cm and mtl/psm2 are used. In that case, then yes, > > the OmniPath library is used even for intra node communications. If this > > library is optimized for intra node, then it will internally uses shared > > memory instead of the NIC. would it be fair to assume that, if we assume the opa library is optimized for intra-node using shared memory, there shouldn't be much of a difference between the opa library and the ompi library for local rank to rank comms is there a way or tool to measure that? i'd like to run the tests toggling opa vs ompi libraries and see if or really how much a difference there is ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] local rank to rank comms
On Mon, Mar 11, 2019 at 11:51 AM Ralph H Castain wrote: > You are probably using the ofi mtl - could be psm2 uses loopback method? according to ompi_info i do in fact have mtl's ofi,psm,psm2. i haven't changed any of the defaults, so are you saying order to change the behaviour i have to run mpirun --mca mtl psm2? if true, what's the recourse to not using the ofi mtl? ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] local rank to rank comms
i have a user that's claiming when two ranks on the same node want to talk with each other, they're using the NIC to talk rather then just talking directly. i've never had to test such a scenario. is there a way for me to prove one way or another whether two ranks are talking through say the kernel (or however it actually works) or using the nic? i didn't set any flags when i compiled openmpi to change this. i'm running ompi 3.1, pmix 2.2.1, and slurm 18.05 running atop omnipath ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] pmix and srun
seems to be better now. jobs are running On Fri, Jan 18, 2019 at 6:17 PM Ralph H Castain wrote: > > I have pushed a fix to the v2.2 branch - could you please confirm it? > > > > On Jan 18, 2019, at 2:23 PM, Ralph H Castain wrote: > > > > Aha - I found it. It’s a typo in the v2.2.1 release. Sadly, our Slurm > > plugin folks seem to be off somewhere for awhile and haven’t been testing > > it. Sigh. > > > > I’ll patch the branch and let you know - we’d appreciate the feedback. > > Ralph > > > > > >> On Jan 18, 2019, at 2:09 PM, Michael Di Domenico > >> wrote: > >> > >> here's the branches i'm using. i did a git clone on the repo's and > >> then a git checkout > >> > >> [ec2-user@labhead bin]$ cd /hpc/src/pmix/ > >> [ec2-user@labhead pmix]$ git branch > >> master > >> * v2.2 > >> [ec2-user@labhead pmix]$ cd ../slurm/ > >> [ec2-user@labhead slurm]$ git branch > >> * (detached from origin/slurm-18.08) > >> master > >> [ec2-user@labhead slurm]$ cd ../ompi/ > >> [ec2-user@labhead ompi]$ git branch > >> * (detached from origin/v3.1.x) > >> master > >> > >> > >> attached is the debug out from the run with the debugging turned on > >> > >> On Fri, Jan 18, 2019 at 4:30 PM Ralph H Castain wrote: > >>> > >>> Looks strange. I’m pretty sure Mellanox didn’t implement the event > >>> notification system in the Slurm plugin, but you should only be trying to > >>> call it if OMPI is registering a system-level event code - which OMPI 3.1 > >>> definitely doesn’t do. > >>> > >>> If you are using PMIx v2.2.0, then please note that there is a bug in it > >>> that slipped through our automated testing. I replaced it today with > >>> v2.2.1 - you probably should update if that’s the case. However, that > >>> wouldn’t necessarily explain this behavior. I’m not that familiar with > >>> the Slurm plugin, but you might try adding > >>> > >>> PMIX_MCA_pmix_client_event_verbose=5 > >>> PMIX_MCA_pmix_server_event_verbose=5 > >>> OMPI_MCA_pmix_base_verbose=10 > >>> > >>> to your environment and see if that provides anything useful. > >>> > >>>> On Jan 18, 2019, at 12:09 PM, Michael Di Domenico > >>>> wrote: > >>>> > >>>> i compilied pmix slurm openmpi > >>>> > >>>> ---pmix > >>>> ./configure --prefix=/hpc/pmix/2.2 --with-munge=/hpc/munge/0.5.13 > >>>> --disable-debug > >>>> ---slurm > >>>> ./configure --prefix=/hpc/slurm/18.08 --with-munge=/hpc/munge/0.5.13 > >>>> --with-pmix=/hpc/pmix/2.2 > >>>> ---openmpi > >>>> ./configure --prefix=/hpc/ompi/3.1 --with-hwloc=external > >>>> --with-libevent=external --with-slurm=/hpc/slurm/18.08 > >>>> --with-pmix=/hpc/pmix/2.2 > >>>> > >>>> everything seemed to compile fine, but when i do an srun i get the > >>>> below errors, however, if i salloc and then mpirun it seems to work > >>>> fine. i'm not quite sure where the breakdown is or how to debug it > >>>> > >>>> --- > >>>> > >>>> [ec2-user@labcmp1 linux]$ srun --mpi=pmix_v2 -n 16 xhpl > >>>> [labcmp6:18353] PMIX ERROR: NOT-SUPPORTED in file > >>>> event/pmix_event_registration.c at line 101 > >>>> [labcmp6:18355] PMIX ERROR: NOT-SUPPORTED in file > >>>> event/pmix_event_registration.c at line 101 > >>>> [labcmp5:18355] PMIX ERROR: NOT-SUPPORTED in file > >>>> event/pmix_event_registration.c at line 101 > >>>> -- > >>>> It looks like MPI_INIT failed for some reason; your parallel process is > >>>> likely to abort. There are many reasons that a parallel process can > >>>> fail during MPI_INIT; some of which are due to configuration or > >>>> environment > >>>> problems. This failure appears to be an internal failure; here's some > >>>> additional information (which may only be relevant to an Open MPI > >>>> developer): > >>>> > >>>> ompi_interlib_declare > >>>> --> Returned "Would block" (-10) instead of "Success" (0) > >>>> ...snipped.
Re: [OMPI users] Fwd: pmix and srun
here's the branches i'm using. i did a git clone on the repo's and then a git checkout [ec2-user@labhead bin]$ cd /hpc/src/pmix/ [ec2-user@labhead pmix]$ git branch master * v2.2 [ec2-user@labhead pmix]$ cd ../slurm/ [ec2-user@labhead slurm]$ git branch * (detached from origin/slurm-18.08) master [ec2-user@labhead slurm]$ cd ../ompi/ [ec2-user@labhead ompi]$ git branch * (detached from origin/v3.1.x) master attached is the debug out from the run with the debugging turned on On Fri, Jan 18, 2019 at 4:30 PM Ralph H Castain wrote: > > Looks strange. I’m pretty sure Mellanox didn’t implement the event > notification system in the Slurm plugin, but you should only be trying to > call it if OMPI is registering a system-level event code - which OMPI 3.1 > definitely doesn’t do. > > If you are using PMIx v2.2.0, then please note that there is a bug in it that > slipped through our automated testing. I replaced it today with v2.2.1 - you > probably should update if that’s the case. However, that wouldn’t necessarily > explain this behavior. I’m not that familiar with the Slurm plugin, but you > might try adding > > PMIX_MCA_pmix_client_event_verbose=5 > PMIX_MCA_pmix_server_event_verbose=5 > OMPI_MCA_pmix_base_verbose=10 > > to your environment and see if that provides anything useful. > > > On Jan 18, 2019, at 12:09 PM, Michael Di Domenico > > wrote: > > > > i compilied pmix slurm openmpi > > > > ---pmix > > ./configure --prefix=/hpc/pmix/2.2 --with-munge=/hpc/munge/0.5.13 > > --disable-debug > > ---slurm > > ./configure --prefix=/hpc/slurm/18.08 --with-munge=/hpc/munge/0.5.13 > > --with-pmix=/hpc/pmix/2.2 > > ---openmpi > > ./configure --prefix=/hpc/ompi/3.1 --with-hwloc=external > > --with-libevent=external --with-slurm=/hpc/slurm/18.08 > > --with-pmix=/hpc/pmix/2.2 > > > > everything seemed to compile fine, but when i do an srun i get the > > below errors, however, if i salloc and then mpirun it seems to work > > fine. i'm not quite sure where the breakdown is or how to debug it > > > > --- > > > > [ec2-user@labcmp1 linux]$ srun --mpi=pmix_v2 -n 16 xhpl > > [labcmp6:18353] PMIX ERROR: NOT-SUPPORTED in file > > event/pmix_event_registration.c at line 101 > > [labcmp6:18355] PMIX ERROR: NOT-SUPPORTED in file > > event/pmix_event_registration.c at line 101 > > [labcmp5:18355] PMIX ERROR: NOT-SUPPORTED in file > > event/pmix_event_registration.c at line 101 > > -- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration or environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > > > ompi_interlib_declare > > --> Returned "Would block" (-10) instead of "Success" (0) > > ...snipped... > > [labcmp6:18355] *** An error occurred in MPI_Init > > [labcmp6:18355] *** reported by process [140726281390153,15] > > [labcmp6:18355] *** on a NULL communicator > > [labcmp6:18355] *** Unknown error > > [labcmp6:18355] *** MPI_ERRORS_ARE_FATAL (processes in this > > communicator will now abort, > > [labcmp6:18355] ***and potentially your MPI job) > > [labcmp6:18352] *** An error occurred in MPI_Init > > [labcmp6:18352] *** reported by process [1677936713,12] > > [labcmp6:18352] *** on a NULL communicator > > [labcmp6:18352] *** Unknown error > > [labcmp6:18352] *** MPI_ERRORS_ARE_FATAL (processes in this > > communicator will now abort, > > [labcmp6:18352] ***and potentially your MPI job) > > [labcmp6:18354] *** An error occurred in MPI_Init > > [labcmp6:18354] *** reported by process [140726281390153,14] > > [labcmp6:18354] *** on a NULL communicator > > [labcmp6:18354] *** Unknown error > > [labcmp6:18354] *** MPI_ERRORS_ARE_FATAL (processes in this > > communicator will now abort, > > [labcmp6:18354] ***and potentially your MPI job) > > srun: Job step aborted: Waiting up to 32 seconds for job step to finish. > > slurmstepd: error: *** STEP 24.0 ON labcmp3 CANCELLED AT > > 2019-01-18T20:03:33 *** > > [labcmp5:18358] PMIX ERROR: NOT-SUPPORTED in file > > event/pmix_event_registration.c at line 101 > > -- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort.
[OMPI users] Fwd: pmix and srun
i compilied pmix slurm openmpi ---pmix ./configure --prefix=/hpc/pmix/2.2 --with-munge=/hpc/munge/0.5.13 --disable-debug ---slurm ./configure --prefix=/hpc/slurm/18.08 --with-munge=/hpc/munge/0.5.13 --with-pmix=/hpc/pmix/2.2 ---openmpi ./configure --prefix=/hpc/ompi/3.1 --with-hwloc=external --with-libevent=external --with-slurm=/hpc/slurm/18.08 --with-pmix=/hpc/pmix/2.2 everything seemed to compile fine, but when i do an srun i get the below errors, however, if i salloc and then mpirun it seems to work fine. i'm not quite sure where the breakdown is or how to debug it --- [ec2-user@labcmp1 linux]$ srun --mpi=pmix_v2 -n 16 xhpl [labcmp6:18353] PMIX ERROR: NOT-SUPPORTED in file event/pmix_event_registration.c at line 101 [labcmp6:18355] PMIX ERROR: NOT-SUPPORTED in file event/pmix_event_registration.c at line 101 [labcmp5:18355] PMIX ERROR: NOT-SUPPORTED in file event/pmix_event_registration.c at line 101 -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_interlib_declare --> Returned "Would block" (-10) instead of "Success" (0) ...snipped... [labcmp6:18355] *** An error occurred in MPI_Init [labcmp6:18355] *** reported by process [140726281390153,15] [labcmp6:18355] *** on a NULL communicator [labcmp6:18355] *** Unknown error [labcmp6:18355] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [labcmp6:18355] ***and potentially your MPI job) [labcmp6:18352] *** An error occurred in MPI_Init [labcmp6:18352] *** reported by process [1677936713,12] [labcmp6:18352] *** on a NULL communicator [labcmp6:18352] *** Unknown error [labcmp6:18352] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [labcmp6:18352] ***and potentially your MPI job) [labcmp6:18354] *** An error occurred in MPI_Init [labcmp6:18354] *** reported by process [140726281390153,14] [labcmp6:18354] *** on a NULL communicator [labcmp6:18354] *** Unknown error [labcmp6:18354] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [labcmp6:18354] ***and potentially your MPI job) srun: Job step aborted: Waiting up to 32 seconds for job step to finish. slurmstepd: error: *** STEP 24.0 ON labcmp3 CANCELLED AT 2019-01-18T20:03:33 *** [labcmp5:18358] PMIX ERROR: NOT-SUPPORTED in file event/pmix_event_registration.c at line 101 -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_interlib_declare --> Returned "Would block" (-10) instead of "Success" (0) -- [labcmp5:18357] PMIX ERROR: NOT-SUPPORTED in file event/pmix_event_registration.c at line 101 [labcmp5:18356] PMIX ERROR: NOT-SUPPORTED in file event/pmix_event_registration.c at line 101 srun: error: labcmp6: tasks 12-15: Exited with exit code 1 srun: error: labcmp3: tasks 0-3: Killed srun: error: labcmp4: tasks 4-7: Killed srun: error: labcmp5: tasks 8-11: Killed ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] OpenFabrics warning
On Mon, Nov 12, 2018 at 8:08 AM Andrei Berceanu wrote: > > Running a CUDA+MPI application on a node with 2 K80 GPUs, I get the following > warnings: > > -- > WARNING: There is at least non-excluded one OpenFabrics device found, > but there are no active ports detected (or Open MPI was unable to use > them). This is most certainly not what you wanted. Check your > cables, subnet manager configuration, etc. The openib BTL will be > ignored for this job. > > Local host: gpu01 > -- > [gpu01:107262] 1 more process has sent help message help-mpi-btl-openib.txt / > no active ports found > [gpu01:107262] Set MCA parameter "orte_base_help_aggregate" to 0 to see all > help / error messages > > Any idea of what is going on and how I can fix this? > I am using OpenMPI 3.1.2. looks like openmpi found something like an infiniband card in the compute node you're using, but it is not active/usable as for a fix, it depends. if you have an IB card should it be active? if so, you'd have to check the connections to see why it's disabled if not, you'll can tell openmpi to disregard the IB ports, which will clear the warning, but that might mean you're potentially using a slower interface for message passing ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] Problem running with UCX/oshmem on single node?
On Wed, May 9, 2018 at 9:45 PM, Howard Pritchardwrote: > > You either need to go and buy a connectx4/5 HCA from mellanox (and maybe a > switch), and install that > on your system, or else install xpmem (https://github.com/hjelmn/xpmem). > Note there is a bug right now > in UCX that you may hit if you try to go thee xpmem only route: How stringent is the Connect-X 4/5 requirement? i have Connect-X 3 cards will they work? during the configure step is seems to yell at me that mlx5 wont compile because i don't have Mellanox OFED v3.1 installed, is that also a requirement (i'm using the RHEl7.4 bundled version of ofed, not then vendor versions) ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] shmem
before i debug ucx further (cause it's totally not working for me), i figured i'd check to see if it's *really* required to use shmem inside of openmpi. i'm pretty sure the answer is yes, but i wanted to double check. ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] openmpi/slurm/pmix
On Mon, Apr 23, 2018 at 6:07 PM, r...@open-mpi.orgwrote: > Looks like the problem is that you didn’t wind up with the external PMIx. The > component listed in your error is the internal PMIx one which shouldn’t have > built given that configure line. > > Check your config.out and see what happened. Also, ensure that your > LD_LIBRARY_PATH is properly pointing to the installation, and that you built > into a “clean” prefix. the "clean prefix" part seemed to fix my issue. i'm not exactly sure i understand why/how though. i recompiled pmix and removed the old installation before doing a make install when i recompiled openmpi it seems to have figured itself out i think things are still a little wonky, but at least that issue is gone ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] openmpi/slurm/pmix
i'm trying to get slurm 17.11.5 and openmpi 3.0.1 working with pmix. everything compiled, but when i run something it get : symbol lookup error: /openmpi/mca_pmix_pmix2x.so: undefined symbol: opal_libevent2022_evthread_use_pthreads i more then sure i did something wrong, but i'm not sure what, here's what i did compile libevent 2.1.8 ./configure --prefix=/libevent-2.1.8 compile pmix 2.1.0 ./configure --prefix=/pmix-2.1.0 --with-psm2 --with-munge=/munge-0.5.13 --with-libevent=/libevent-2.1.8 compile openmpi ./configure --prefix=/openmpi-3.0.1 --with-slurm=/slurm-17.11.5 --with-hwloc=external --with-mxm=/opt/mellanox/mxm --with-cuda=/usr/local/cuda --with-pmix=/pmix-2.1.0 --with-libevent=/libevent-2.1.8 when i look at the symbols in the mca_pmix_pmix2x.so library the function is indeed undefined (U) in the output, but checking ldd against the library doesn't show any missing any thoughts? ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] disabling libraries?
On Sat, Apr 7, 2018 at 3:50 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: > On Apr 6, 2018, at 8:12 AM, Michael Di Domenico <mdidomeni...@gmail.com> > wrote: >> it would be nice if openmpi had (or may already have) a simple switch >> that lets me disable entire portions of the library chain, ie this >> host doesn't have a particular interconnect, so don't load any of the >> libraries. this might run counter to how openmpi discovers and load >> libs though. > > We've actually been arguing about exactly how to do this for quite a while. > It's complicated (I can explain further, if you care). :-\ i have no doubt its complicated. i'm not overly interested in the detail, but others i'm sure might be. in reality you're correct, i don't care that openmpi failed to load the libs given the fact that the job continues to run without issue. and in fact i don't even care about the warnings, but my users will complain and ask questions. achieving a single build binary where i can disable the interconnects/libraries at runtime would be HIGHLY beneficial to me (perhaps others as well). it cuts my build version combinations from like 12 to 4 (or less), that's a huge reduction in labour/maintenance. which also means i can upgrade openmpi quicker and stay more up to date. i would garner this is probably not a high priority for the team working on openmpi, but if there's something my organization or I can do to push this higher, let me know. > That being said, I think we *do* have a workaround that might be good enough > for you: disable those warnings about plugins not being able to be opened: > mpirun --mca mca_component_show_load_errors 0 ... disabled this: mca_base_component_repository_open: unable to open mca_oob_ud: libibverbs.so.1 but not this: pmix_mca_base_component_repository_open: unable to open mca_pnet_opa: libpsm2.so.2 ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] disabling libraries?
On Thu, Apr 5, 2018 at 7:59 PM, Gilles Gouaillardetwrote: > That being said, the error suggest mca_oob_ud.so is a module from a > previous install, > Open MPI was not built on the system it is running, or libibverbs.so.1 > has been removed after > Open MPI was built. yes, understood, i compiled openmpi on a node that has all the libraries installed for our various interconnects, opa/psm/mxm/ib, but i ran mpirun on a node that has none of them so the resulting warnings i get mca_btl_openib: lbrdmacm.so.1 mca_btl_usnic: libfabric.so.1 mca_oob_ud: libibverbs.so.1 mca_mtl_mxm: libmxm.so.2 mca_mtl_ofi: libfabric.so.1 mca_mtl_psm: libpsm_infinipath.so.1 mca_mtl_psm2: libpsm2.so.2 mca_pml_yalla: libmxm.so.2 you referenced them as "errors" above, but mpi actually runs just fine for me even with these msgs, so i would consider them more warnings. > So I do encourage you to take a step back, and think if you can find a > better solution for your site. there are two alternatives 1 i can compile a specific version of openmpi for each of our clusters with each specific interconnect libraries 2 i can install all the libraries on all the machines regardless of whether the interconnect is present both are certainly plausible, but my effort here is to see if i can reduce the size of our software stack and/or reduce the number of compiled versions of openmpi it would be nice if openmpi had (or may already have) a simple switch that lets me disable entire portions of the library chain, ie this host doesn't have a particular interconnect, so don't load any of the libraries. this might run counter to how openmpi discovers and load libs though. ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] disabling libraries?
i'm trying to compile openmpi to support all of our interconnects, psm/openib/mxm/etc this works fine, openmpi finds all the libs, compiles and runs on each of the respective machines however, we don't install the libraries for everything everywhere so when i run things like ompi_info and mpirun i get mca_base_component_reposity_open: unable to open mca_oob_ud: libibverbs.so.1: cannot open shared object file: no such file or directory (ignored) and so on, for a bunch of other libs. i understand how the lib linking works so this isn't unexpected and doesn't stop the mpi programs from running. here's the part i don't understand, how can i trace the above warning and others like it back the required --mca parameters i need to add into the configuration to make the warnings go away? as an aside, i believe i can set most of them via environment variables as well as the command, but what i really like to do is set them from a file. i know i can create a default param file, but is there a way to feed a param file at invocation depending where mpirun is being run? ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] openmpi hang on IB disconnect
openmpi-2.0.2 running on rhel 7.4 with qlogic QDR infiniband switches/adapters, also using slurm i have a user that's running a job over multiple days. unfortunately after a few days at random the job will seemingly hang. the latest instance was caused by an infiniband adapter that went offline and online several times. the card is in a semi-working state at the moment, it's passing traffic, but i suspect some of the IB messages during the job run got lost and now the job is seemingly hung. is there some mechanism i can put in place to detect this condition either in the code or on the system. it's causing two problems at the moment. first and foremost the user has no idea the job hung and for what reason. second it's wasting system time. i'm sure other people have come across wonky IB cards, i'm curious how everyone else is detecting this condition and dealing with it. ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] openmpi mgmt traffic
my cluster nodes are connected on 1g ethernet eth0/eth1 and via infiniband rdma and ib0 my understanding is that openmpi will detect all these interfaces. using eth0/eth1 for connection setup and use rdma for msg passing what would be an appropriate to command line parameters to tell openmpi to ipoib for connection setup and rdma for message passing? i effectively want to ignore the 1g ethernet connections for anything ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] alltoallv
i'm getting stuck trying to run some fairly large IMB-MPI alltoall tests under openmpi 2.0.2 on rhel 7.4 i have two different clusters, one running mellanox fdr10 and one running qlogic qdr if i issue mpirun -n 1024 ./IMB-MPI1 -npmin 1024 -iter 1 -mem 2.001 alltoallv the job just stalls after the "List of Benchmarks to run: Alltoallv" line outputs from IMB-MPI if i switch it to alltoall the test does progress often when running various size alltoall's i'll get "too many retries sending message to <>:<>, giving up i'm able to use infiniband just fine (our lustre filesystem mounts over it) and i have other mpi programs running it only seems to stem when i run alltoall type primitives any thoughts on debugging where the failures are, i might just need to turn up the debugging, but i'm not sure where ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] disable slurm/munge from mpirun
On Thu, Jun 22, 2017 at 12:41 PM, r...@open-mpi.orgwrote: > I gather you are using OMPI 2.x, yes? And you configured it > --with-pmi=, then moved the executables/libs to your > workstation? correct > I suppose I could state the obvious and say “don’t do that - just rebuild it” correct... but bummer... so much for being lazy... > and I fear that (after checking the 2.x code) you really have no choice. OMPI > v3.0 will have a way around the problem, but not the 2.x series. ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] disable slurm/munge from mpirun
On Thu, Jun 22, 2017 at 10:43 AM, John Hearns via userswrote: > Having had some problems with ssh launching (a few minutes ago) I can > confirm that this works: > > --mca plm_rsh_agent "ssh -v" this doesn't do anything for me if i set OMPI_MCA_sec=^munge i can clear the mca_sec_munge error but the mca_pmix_pmix112 and opal_pmix_base_select errors still exists. the plm_rsh_agent switch/env var doesn't seem to affect that error down the road, i may still need the rsh_agent flag, but i think we're still before that in the sequence of events ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] disable slurm/munge from mpirun
that took care of one of the errors, but i missed a re-type on the second error mca_base_component_repository_open: unable to open mca_pmix_pmix112: libmunge missing and the opal_pmix_base_select error is still there (which is what's actually halting my job) On Thu, Jun 22, 2017 at 10:35 AM, r...@open-mpi.org <r...@open-mpi.org> wrote: > You can add "OMPI_MCA_plm=rsh OMPI_MCA_sec=^munge” to your environment > > > On Jun 22, 2017, at 7:28 AM, John Hearns via users > <users@lists.open-mpi.org> wrote: > > Michael, try > --mca plm_rsh_agent ssh > > I've been fooling with this myself recently, in the contect of a PBS cluster > > On 22 June 2017 at 16:16, Michael Di Domenico <mdidomeni...@gmail.com> > wrote: >> >> is it possible to disable slurm/munge/psm/pmi(x) from the mpirun >> command line or (better) using environment variables? >> >> i'd like to use the installed version of openmpi i have on a >> workstation, but it's linked with slurm from one of my clusters. >> >> mpi/slurm work just fine on the cluster, but when i run it on a >> workstation i get the below errors >> >> mca_base_component_repositoy_open: unable to open mca_sec_munge: >> libmunge missing >> ORTE_ERROR_LOG Not found in file ess_hnp_module.c at line 648 >> opal_pmix_base_select failed >> returned value not found (-13) instead of orte_success >> >> there's probably a magical incantation of mca parameters, but i'm not >> adept enough at determining what they are >> ___ >> users mailing list >> users@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > ___ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > ___ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
[OMPI users] disable slurm/munge from mpirun
is it possible to disable slurm/munge/psm/pmi(x) from the mpirun command line or (better) using environment variables? i'd like to use the installed version of openmpi i have on a workstation, but it's linked with slurm from one of my clusters. mpi/slurm work just fine on the cluster, but when i run it on a workstation i get the below errors mca_base_component_repositoy_open: unable to open mca_sec_munge: libmunge missing ORTE_ERROR_LOG Not found in file ess_hnp_module.c at line 648 opal_pmix_base_select failed returned value not found (-13) instead of orte_success there's probably a magical incantation of mca parameters, but i'm not adept enough at determining what they are ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] openmpi 1.10.2 and PGI 15.9
On Mon, Jul 25, 2016 at 4:53 AM, Gilles Gouaillardetwrote: > > as a workaround, you can configure without -noswitcherror. > > after you ran configure, you have to manually patch the generated 'libtool' > file and add the line with pgcc*) and the next line like this : > > /* if pgcc is used, libtool does *not* pass -pthread to pgcc any more */ > > ># Convert "-framework foo" to "foo.ltframework" > # and "-pthread" to "-Wl,-pthread" if NAG compiler > if test -n "$inherited_linker_flags"; then > case "$CC" in > nagfor*) > tmp_inherited_linker_flags=`$ECHO "$inherited_linker_flags" | > $SED 's/-framework \([^ $]*\)/\1.ltframework/g' | $SED > 's/-pthread/-Wl,-pthread/g'`;; > pgcc*) > tmp_inherited_linker_flags=`$ECHO "$inherited_linker_flags" | > $SED 's/-framework \([^ $]*\)/\1.ltframework/g' | $SED 's/-pthread//g'`;; > *) > tmp_inherited_linker_flags=`$ECHO "$inherited_linker_flags" | > $SED 's/-framework \([^ $]*\)/\1.ltframework/g'`;; > esac > > > i guess the right way is to patch libtool so it passes -noswitcherror to $CC > and/or $LD, but i was not able to achieve that yet. Thanks. I managed to work around the issue, by hand compiling the single module that failed during the build process. but something is definitely amiss in the openmpi compile system when it comes to pgi
Re: [OMPI users] openmpi 1.10.2 and PGI 15.9
So, the -noswitcherror is partially working. I added the switch into my configure line LDFLAGS param. I can see the parameter being passed to libtool, but for some reason libtool is refusing to passing it along at compile. if i sh -x the libtool command line, i can see it set in a few variables, but at the end when eval's the compile line for pgcc the option is missing. if i cut and past the eval line and hand put it back in, the library compiles with a pgcc warning instead of an error which i believe what i want, but i'm not sure why libtool is dropping the switch On Tue, Jul 19, 2016 at 5:27 AM, Sylvain Jeaugey <sjeau...@nvidia.com> wrote: > As a workaround, you can also try adding -noswitcherror to PGCC flags. > > On 07/11/2016 03:52 PM, Åke Sandgren wrote: >> >> Looks like you are compiling with slurm support. >> >> If so, you need to remove the "-pthread" from libslurm.la and libpmi.la >> >> On 07/11/2016 02:54 PM, Michael Di Domenico wrote: >>> >>> I'm trying to get openmpi compiled using the PGI compiler. >>> >>> the configure goes through and the code starts to compile, but then >>> gets hung up with >>> >>> entering: openmpi-1.10.2/opal/mca/common/pmi >>> CC common_pmi.lo >>> CCLD libmca_common_pmi.la >>> pgcc-Error-Unknown switch: - pthread >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/07/29635.php >>> > > --- > This email message is for the sole use of the intended recipient(s) and may > contain > confidential information. Any unauthorized review, use, disclosure or > distribution > is prohibited. If you are not the intended recipient, please contact the > sender by > reply email and destroy all copies of the original message. > --- > ___ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/07/29692.php
Re: [OMPI users] openmpi 1.10.2 and PGI 15.9
On Mon, Jul 11, 2016 at 9:52 AM, Åke Sandgrenwrote: > Looks like you are compiling with slurm support. > > If so, you need to remove the "-pthread" from libslurm.la and libpmi.la i don't see a configure option in slurm to disable pthreads, so i'm not sure this is possible.
Re: [OMPI users] openmpi 1.10.2 and PGI 15.9
On Thu, Jul 14, 2016 at 9:47 AM, Michael Di Domenico <mdidomeni...@gmail.com> wrote: > Have 1.10.3 unpacked, ran through the configure using the same command > line options as 1.10.2 > > but it fails even earlier in the make process at > > Entering openmpi-1.10.3/opal/asm > CPPAS atomic-asm.lo > This licensed Software was made available from Nvidia Corportation > under a time-limited beta license the beta license expires on jun 1 2015 > any attempt to use this product after jun 1 2015 is a violation of the terms > of the PGI end user license agreement. sorry, i take this back, i accidentally used PGI 15.3 compiler instead of 15.9 using 15.9 i get the same -pthread error from the slurm_pmi library.
Re: [OMPI users] openmpi 1.10.2 and PGI 15.9
Have 1.10.3 unpacked, ran through the configure using the same command line options as 1.10.2 but it fails even earlier in the make process at Entering openmpi-1.10.3/opal/asm CPPAS atomic-asm.lo This licensed Software was made available from Nvidia Corportation under a time-limited beta license the beta license expires on jun 1 2015 any attempt to use this product after jun 1 2015 is a violation of the terms of the PGI end user license agreement. On Mon, Jul 11, 2016 at 9:11 AM, Gilles Gouaillardet <gilles.gouaillar...@gmail.com> wrote: > Can you try the latest 1.10.3 instead ? > > btw, do you have a license for the pgCC C++ compiler ? > fwiw, FreePGI on OSX has no C++ license and PGI C and gnu g++ does not work > together out of the box, hopefully I will have a fix ready sometimes this > week > > Cheers, > > Gilles > > > On Monday, July 11, 2016, Michael Di Domenico <mdidomeni...@gmail.com> > wrote: >> >> I'm trying to get openmpi compiled using the PGI compiler. >> >> the configure goes through and the code starts to compile, but then >> gets hung up with >> >> entering: openmpi-1.10.2/opal/mca/common/pmi >> CC common_pmi.lo >> CCLD libmca_common_pmi.la >> pgcc-Error-Unknown switch: - pthread >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/07/29635.php > > > ___ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/07/29636.php
Re: [OMPI users] openmpi 1.10.2 and PGI 15.9
On Mon, Jul 11, 2016 at 9:11 AM, Gilles Gouaillardetwrote: > Can you try the latest 1.10.3 instead ? i can but it'll take a few days to pull the software inside. > btw, do you have a license for the pgCC C++ compiler ? > fwiw, FreePGI on OSX has no C++ license and PGI C and gnu g++ does not work > together out of the box, hopefully I will have a fix ready sometimes this > week we should, but i'm not positive. we're running PGI on linux x64, we typically buy the full suite, but i'll double check.
[OMPI users] openmpi 1.10.2 and PGI 15.9
I'm trying to get openmpi compiled using the PGI compiler. the configure goes through and the code starts to compile, but then gets hung up with entering: openmpi-1.10.2/opal/mca/common/pmi CC common_pmi.lo CCLD libmca_common_pmi.la pgcc-Error-Unknown switch: - pthread
Re: [OMPI users] locked memory and queue pairs
On Thu, Mar 17, 2016 at 12:15 PM, Cabral, Matias Awrote: > I was looking for lines like" [nodexyz:17085] selected cm best priority 40" > and " [nodexyz:17099] select: component psm selected" this may have turned up more then i expected. i recompiled openmpi v1.8.4 as a test and reran the tests. which seemed to run just fine. looking at the debug output, i can clearly see a difference in the psm calls. i performed the same test using 1.10.2 and it works as well. i've sent a msg off to the user to have him rerun and see where we're at. i suspect my system level compile of openmpi might be all screwed up with concern for psm. i didn't see anything off in the configure output, but i must have missed something. i'll report back
Re: [OMPI users] locked memory and queue pairs
On Thu, Mar 17, 2016 at 12:52 PM, Jeff Squyres (jsquyres)wrote: > Can you send all the information listed here? > > https://www.open-mpi.org/community/help/ > > (including the full output from the run with the PML/BTL/MTL/etc. verbosity) > > This will allow Matias to look through all the relevant info, potentially > with fewer back-n-forth emails. Understood, but unfortunately i cannot pull large dumps from the system, its isolated.
Re: [OMPI users] locked memory and queue pairs
On Thu, Mar 17, 2016 at 12:15 PM, Cabral, Matias Awrote: > I was looking for lines like" [nodexyz:17085] selected cm best priority 40" > and " [nodexyz:17099] select: component psm selected" i see cm best priority 20, which seems to relate to ob1 being selected. i don't see a mention of psm anywhere (i am NOT doing --mca mtl ^psm), but i did compile openmpi with psm support
Re: [OMPI users] locked memory and queue pairs
On Wed, Mar 16, 2016 at 4:49 PM, Cabral, Matias Awrote: > I didn't go into the code to see who is actually calling this error message, > but I suspect this may be a generic error for "out of memory" kind of thing > and not specific to the que pair. To confirm please add -mca > pml_base_verbose 100 and add -mca mtl_base_verbose 100 to see what is being > selected. this didn't spit out anything overly useful, just lots of lines [node001:00909] mca: base: components_register: registering pml components [node001:00909] mca: base: components_register: found loaded component v [node001:00909] mca: base: components_register: component v register function successful [node001:00909] mca: base: components_register: found loaded component bfo [node001:00909] mca: base: components_register: component bfo register function successful [node001:00909] mca: base: components_register: found loaded component cm [node001:00909] mca: base: components_register: component cm register function successful [node001:00909] mca: base: components_register: found loaded component ob1 [node001:00909] mca: base: components_register: component ob1 register function successful > I'm trying to remember some details of IMB and alltoallv to see if it is > indeed requiring more resources that the other micro benchmarks. i'm using IMB for my tests, but this issue came up because a researcher isn't able to run large alltoall codes, so i don't believe it's specific to IMB > BTW, did you confirm the limits setup? Also do the nodes have all the same > amount of mem? yes, all nodes have the limits set to unlimited and each node has 256GB of memory
Re: [OMPI users] locked memory and queue pairs
On Wed, Mar 16, 2016 at 3:37 PM, Cabral, Matias Awrote: > Hi Michael, > > I may be missing some context, if you are using the qlogic cards you will > always want to use the psm mtl (-mca pml cm -mca mtl psm) and not openib btl. > As Tom suggest, confirm the limits are setup on every node: could it be the > alltoall is reaching a node that "others" are not? Please share the command > line and the error message. Yes, under normal circumstances, I use PSM. i only disabled to see if it affected any kind of change. the test i'm running is mpirun -n 512 ./IMB-MPI1 alltoallv when the system gets to 128 ranks, it freezes and errors out with --- A process failed to create a queue pair. This usually means either the device has run out of queue pairs (too many connections) or there are insufficient resources available to allocate a queue pair (out of memory). The latter can happen if either 1) insufficient memory is available, or 2) no more physical memory can be registered with the device. For more information on memory registration see the Open MPI FAQs at: http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages Local host: node001 Local device: qib0 Queue pair type:Reliable connected (RC) --- i've also tried various nodes across the cluster (200+). i think i ruled out errant switch (qlogic single 12800-120) problems, bad cables, and bad nodes. that's not to say they're may not be present, i've just not been able to find it
Re: [OMPI users] locked memory and queue pairs
On Wed, Mar 16, 2016 at 12:12 PM, Elken, Tomwrote: > Hi Mike, > > In this file, > $ cat /etc/security/limits.conf > ... > < do you see at the end ... > > > * hard memlock unlimited > * soft memlock unlimited > # -- All InfiniBand Settings End here -- > ? Yes. I double checked that it's set on all compute nodes in the actual file and through the ulimit command
Re: [OMPI users] locked memory and queue pairs
On Thu, Mar 10, 2016 at 11:54 AM, Michael Di Domenico <mdidomeni...@gmail.com> wrote: > when i try to run an openmpi job with >128 ranks (16 ranks per node) > using alltoall or alltoallv, i'm getting an error that the process was > unable to get a queue pair. > > i've checked the max locked memory settings across my machines; > > using ulimit -l in and outside of mpirun and they're all set to unlimited > pam modules to ensure pam_limits.so is loaded and working > the /etc/security/limits.conf is set for soft/hard mem to unlimited > > i tried a couple of quick mpi config settings i could think of; > > -mca mtl ^psm no affect > -mca btl_openib_flags 1 no affect > > the openmpi faq says to tweak some mtt values in /sys, but since i'm > not on mellanox that doesn't apply to me > > the machines are rhel 6.7, kernel 2.6.32-573.12.1(with bundled ofed), > running on qlogic single-port infiniband cards, psm is enabled > > other collectives seem to run okay, it seems to only be alltoall comms > that fail and only at scale > > i believe (but can't prove) that this worked at one point, but i can't > recall when i last tested it. so it's reasonable to assume that some > change to the system is preventing this. > > the question is, where should i start poking to find it? bump?
[OMPI users] locked memory and queue pairs
when i try to run an openmpi job with >128 ranks (16 ranks per node) using alltoall or alltoallv, i'm getting an error that the process was unable to get a queue pair. i've checked the max locked memory settings across my machines; using ulimit -l in and outside of mpirun and they're all set to unlimited pam modules to ensure pam_limits.so is loaded and working the /etc/security/limits.conf is set for soft/hard mem to unlimited i tried a couple of quick mpi config settings i could think of; -mca mtl ^psm no affect -mca btl_openib_flags 1 no affect the openmpi faq says to tweak some mtt values in /sys, but since i'm not on mellanox that doesn't apply to me the machines are rhel 6.7, kernel 2.6.32-573.12.1(with bundled ofed), running on qlogic single-port infiniband cards, psm is enabled other collectives seem to run okay, it seems to only be alltoall comms that fail and only at scale i believe (but can't prove) that this worked at one point, but i can't recall when i last tested it. so it's reasonable to assume that some change to the system is preventing this. the question is, where should i start poking to find it?
[OMPI users] slurm openmpi 1.8.3 core bindings
I'm trying to get slurm and openmpi to cooperate when running multi thread jobs. i'm sure i'm doing something wrong, but i can't figure out what my node configuration is 2 nodes 2 sockets 6 cores per socket i want to run sbatch -N2 -n 8 --ntasks-per-node=4 --cpus-per-task=3 -w node1,node2 program.sbatch inside the program.sbatch i'm calling openmpi mpirun -n $SLURM_NTASKS --report-bindings program when the binds report comes out i get node1 rank 0 socket 0 core 0 node1 rank 1 socket 1 core 6 node1 rank 2 socket 0 core 1 node1 rank 3 socket 1 core 7 node2 rank 4 socket 0 core 0 node2 rank 5 socket 1 core 6 node2 rank 6 socket 0 core 1 node2 rank 7 socket 1 core 7 which is semi-fine, but when the job runs the resulting threads from the program are locked (according to top) to those eight cores rather then spreading themselves over the 24 cores available i tried a few incantations of the map-by, bind-to, etc, but openmpi basically complained about everything i tried for one reason or another my understand is that slurm should be passing the requested config to openmpi (or openmpi is pulling from the environment somehow) and it should magically work if i skip slurm and run mpirun -n 8 --map-by node:pe=3 -bind-to core -host node1,node2 --report-bindings program node1 rank 0 socket 0 core 0 node2 rank 1 socket 0 core 0 node1 rank 2 socket 0 core 3 node2 rank 3 socket 0 core 3 node1 rank 4 socket 1 core 6 node2 rank 5 socket 1 core 6 node1 rank 6 socket 1 core 9 node2 rank 7 socket 1 core 9 i do get the behavior i want (though i would prefer a -npernode switch in there, but openmpi complains). the bindings look better and the threads are not locked to the particular cores therefore i'm pretty sure this is a problem between openmpi and slurm and not necessarily with either individually i did compile openmpi with the slurm support switch and we're using the cgroups taskplugin within slurm i guess ancillary to this, is there a way to turn off core binding/placement routines and control the placement manually?
Re: [OMPI users] ipath_userinit errors
Andrew, Thanks. We're using the RHEL version because it was less complicated for our environment in the past, but sounds like we might want to reconsider that decision. Do you know why we don't see the message with lower node count allocations? It only seems to happen when the node count gets over a certain point? thanks On Wed, Nov 5, 2014 at 5:51 PM, Friedley, Andrew <andrew.fried...@intel.com> wrote: > Hi Michael, > > From what I understand, this is an issue with the qib driver and PSM from > RHEL 6.5 and 6.6, and will be fixed for 6.7. There is no functional change > between qib->PSM API versions 11 and 12, so the message is harmless. I > presume you're using the RHEL sourced package for a reason, but using an IFS > release would fix the problem until RHEL 6.7 is ready. > > Andrew > >> -Original Message- >> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Michael Di >> Domenico >> Sent: Tuesday, November 4, 2014 8:35 AM >> To: Open MPI Users >> Subject: [OMPI users] ipath_userinit errors >> >> I'm getting the below message on my cluster(s). It seems to only happen >> when I try to use more then 64 nodes (16-cores each). The clusters are >> running RHEL 6.5 with Slurm and Openmpi-1.6.5 with PSM. >> I'm using the OFED versions included with RHEL for infiniband support. >> >> ipath_userinit: Mismatched user minor version (12) and driver minor version >> (11) while context sharing. Ensure that driver and library are from the same >> release >> >> I already realize this is a warning message and the jobs complete. >> Another user a little over a year ago had a similar issue that was tracked to >> mismatched ofed versions. Since i have a diskless cluster all my nodes are >> identical. >> >> I'm not adverse to thinking there might not be something unique about my >> machine, but since i have two separate machines doing it, I'm not really sure >> where to look to triage the issue and see what might be set incorrectly. >> >> Any thoughts on where to start checking would be helpful, thanks... >> ___ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: http://www.open- >> mpi.org/community/lists/users/2014/11/25667.php > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/11/25694.php
[OMPI users] ipath_userinit errors
I'm getting the below message on my cluster(s). It seems to only happen when I try to use more then 64 nodes (16-cores each). The clusters are running RHEL 6.5 with Slurm and Openmpi-1.6.5 with PSM. I'm using the OFED versions included with RHEL for infiniband support. ipath_userinit: Mismatched user minor version (12) and driver minor version (11) while context sharing. Ensure that driver and library are from the same release I already realize this is a warning message and the jobs complete. Another user a little over a year ago had a similar issue that was tracked to mismatched ofed versions. Since i have a diskless cluster all my nodes are identical. I'm not adverse to thinking there might not be something unique about my machine, but since i have two separate machines doing it, I'm not really sure where to look to triage the issue and see what might be set incorrectly. Any thoughts on where to start checking would be helpful, thanks...
Re: [OMPI users] debugs for jobs not starting
turned on the daemon debugs for orted and noticed this difference i get this on all the good nodes (ones that actually started xhpl) Daemon was launched on node08 - beginning to initialize [node08:21230] [[64354,0],1] orted_cmd: received add_local_procs [node08:21230] [[64354,0],0] orted_recv: received sync+nidmap from local proc [[64354,1],84] [node08:21230] [[64354,0],0] orted_recv: received sync+nidmap from local proc [[64354,1],85] [node08:21230] [[64354,0],0] orted_recv: received sync+nidmap from local proc [[64354,1],86] [node08:21230] [[64354,0],0] orted_recv: received sync+nidmap from local proc [[64354,1],87] [node08:21230] [[64354,0],0] orted_recv: received sync+nidmap from local proc [[64354,1],88] [node08:21230] [[64354,0],0] orted_recv: received sync+nidmap from local proc [[64354,1],89] [node08:21230] [[64354,0],0] orted_recv: received sync+nidmap from local proc [[64354,1],90] [node08:21230] [[64354,0],0] orted_recv: received sync+nidmap from local proc [[64354,1],91] [node08:21230] [[64354,0],0] orted_recv: received sync+nidmap from local proc [[64354,1],92] [node08:21230] [[64354,0],0] orted_recv: received sync+nidmap from local proc [[64354,1],93] [node08:21230] [[64354,0],0] orted_recv: received sync+nidmap from local proc [[64354,1],94] [node08:21230] [[64354,0],0] orted_recv: received sync+nidmap from local proc [[64354,1],95] [node08:21230] [[64354,0],1] orted: up and running - waiting for commands! [node08:21230] procdir: /tmp/openmpi-sessions-user@node08_0/28/1/1 [node08:21230] jobdir: /tmp/openmpi-sessions-user@node08_/44228/1 [node08:21230] top: openmpi-sessions-user@node08_0 [node08:21230] tmp: /tmp [...repeats the above five lines a bunch of times...] --- get this on the ones that do not start xhpl Daemon was launched on node06 - beginning to initialize [node06:11230] [[46344,0],1] orted: up and running - waiting for commands! [node06:11230] procdir: /tmp/openmpi-sessions-user@node06_0/28/1/1 [node06:11230] jobdir: /tmp/openmpi-sessions-user@node06_/44228/1 [node06:11230] top: openmpi-sessions-user@node06_0 [node06:11230] tmp: /tmp [...above lines only come out once...] On Fri, Oct 12, 2012 at 9:27 AM, Michael Di Domenico <mdidomeni...@gmail.com> wrote: > what isn't working is when i fire off an MPI job with over 800 ranks, > they don't all actually start up a process > > fe, if i do srun -n 1024 --ntasks-per-node 12 xhpl > > and then do a 'pgrep xhpl | wc -l', on all of the allocated nodes, not > all of them have actually started xhpl > > most will read 12 started processes, but an inconsistent list of nodes > will fail to actually start xhpl and stall the whole job > > if i look at all the nodes allocated to my job, it does start the orte > process though > > what i need to figure out, is why the orte process starts, but fails > to actually start xhpl on some of the nodes > > unfortunately, the list of nodes that don't start xhpl during my runs > changes each time and no hardware errors are being detected. if i > cancel the job and restart the job over and over, eventually one will > actually kick off and run to completion. > > if i run the process outside of slurm just using openmpi, it seems to > behave correctly, so i'm leaning towards a slurm interacting with > openmpi problem. > > what i'd like to do is instrument a debug in openmpi that will tell me > what openmpi is waiting on in order to kick off the xhpl binary > > i'm testing to see whether it's a psm related problem now, i'll check > back if i can narrow the scope a little more > > On Thu, Oct 11, 2012 at 10:21 PM, Ralph Castain <r...@open-mpi.org> wrote: >> I'm afraid I'm confused - I don't understand what is and isn't working. What >> "next process" isn't starting? >> >> >> On Thu, Oct 11, 2012 at 9:41 AM, Michael Di Domenico >> <mdidomeni...@gmail.com> wrote: >>> >>> adding some additional info >>> >>> did an strace on an orted process where xhpl failed to start, i did >>> this after the mpirun execution, so i probably missed some output, but >>> it keeps scrolling >>> >>> poll([{fd=4, events=POLLIN},{fd=7, events=POLLIN},{fd=8, >>> events=POLLIN},{fd=10, events=POLLIN},{fd=12, events=POLLIN},{fd=13, >>> events=POLLIN},{fd=14, events=POLLIN},{fd=15, events=POLLIN},{fd=16, >>> events=POLLIN}], 9, 1000) = 0 (Timeout) >>> >>> i didn't see anything useful in /proc under those file descriptors, >>> but perhaps i missed something i don't know to look for >>> >>> On Thu, Oct 11, 2012 at 12:06 PM, Michael Di Domenico >>> <mdidomeni...@gmail.com> wrote: >>> > too add a little more detail, it looks like xhpl is not actually >>> > starting on all nodes when i kick of
Re: [OMPI users] debugs for jobs not starting
what isn't working is when i fire off an MPI job with over 800 ranks, they don't all actually start up a process fe, if i do srun -n 1024 --ntasks-per-node 12 xhpl and then do a 'pgrep xhpl | wc -l', on all of the allocated nodes, not all of them have actually started xhpl most will read 12 started processes, but an inconsistent list of nodes will fail to actually start xhpl and stall the whole job if i look at all the nodes allocated to my job, it does start the orte process though what i need to figure out, is why the orte process starts, but fails to actually start xhpl on some of the nodes unfortunately, the list of nodes that don't start xhpl during my runs changes each time and no hardware errors are being detected. if i cancel the job and restart the job over and over, eventually one will actually kick off and run to completion. if i run the process outside of slurm just using openmpi, it seems to behave correctly, so i'm leaning towards a slurm interacting with openmpi problem. what i'd like to do is instrument a debug in openmpi that will tell me what openmpi is waiting on in order to kick off the xhpl binary i'm testing to see whether it's a psm related problem now, i'll check back if i can narrow the scope a little more On Thu, Oct 11, 2012 at 10:21 PM, Ralph Castain <r...@open-mpi.org> wrote: > I'm afraid I'm confused - I don't understand what is and isn't working. What > "next process" isn't starting? > > > On Thu, Oct 11, 2012 at 9:41 AM, Michael Di Domenico > <mdidomeni...@gmail.com> wrote: >> >> adding some additional info >> >> did an strace on an orted process where xhpl failed to start, i did >> this after the mpirun execution, so i probably missed some output, but >> it keeps scrolling >> >> poll([{fd=4, events=POLLIN},{fd=7, events=POLLIN},{fd=8, >> events=POLLIN},{fd=10, events=POLLIN},{fd=12, events=POLLIN},{fd=13, >> events=POLLIN},{fd=14, events=POLLIN},{fd=15, events=POLLIN},{fd=16, >> events=POLLIN}], 9, 1000) = 0 (Timeout) >> >> i didn't see anything useful in /proc under those file descriptors, >> but perhaps i missed something i don't know to look for >> >> On Thu, Oct 11, 2012 at 12:06 PM, Michael Di Domenico >> <mdidomeni...@gmail.com> wrote: >> > too add a little more detail, it looks like xhpl is not actually >> > starting on all nodes when i kick off the mpirun >> > >> > each time i cancel and restart the job, the nodes that do not start >> > change, so i can't call it a bad node >> > >> > if i disable infiniband with --mca btl self,sm,tcp on occasion i can >> > get xhpl to actually run, but it's not consistent >> > >> > i'm going to check my ethernet network and make sure there's no >> > problems there (could this be an OOB error with mpirun?), on the nodes >> > that fail to start xhpl, i do see the orte process, but nothing in the >> > logs about why it failed to launch xhpl >> > >> > >> > >> > On Thu, Oct 11, 2012 at 11:49 AM, Michael Di Domenico >> > <mdidomeni...@gmail.com> wrote: >> >> I'm trying to diagnose an MPI job (in this case xhpl), that fails to >> >> start when the rank count gets fairly high into the thousands. >> >> >> >> My symptom is the jobs fires up via slurm, and I can see all the xhpl >> >> processes on the nodes, but it never kicks over to the next process. >> >> >> >> My question is, what debugs should I turn on to tell me what the >> >> system might be waiting on? >> >> >> >> I've checked a bunch of things, but I'm probably overlooking something >> >> trivial (which is par for me). >> >> >> >> I'm using the Openmpi 1.6.1, Slurm 2.4.2 on CentOS 6.3, with >> >> Infiniband/PSM >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] debugs for jobs not starting
adding some additional info did an strace on an orted process where xhpl failed to start, i did this after the mpirun execution, so i probably missed some output, but it keeps scrolling poll([{fd=4, events=POLLIN},{fd=7, events=POLLIN},{fd=8, events=POLLIN},{fd=10, events=POLLIN},{fd=12, events=POLLIN},{fd=13, events=POLLIN},{fd=14, events=POLLIN},{fd=15, events=POLLIN},{fd=16, events=POLLIN}], 9, 1000) = 0 (Timeout) i didn't see anything useful in /proc under those file descriptors, but perhaps i missed something i don't know to look for On Thu, Oct 11, 2012 at 12:06 PM, Michael Di Domenico <mdidomeni...@gmail.com> wrote: > too add a little more detail, it looks like xhpl is not actually > starting on all nodes when i kick off the mpirun > > each time i cancel and restart the job, the nodes that do not start > change, so i can't call it a bad node > > if i disable infiniband with --mca btl self,sm,tcp on occasion i can > get xhpl to actually run, but it's not consistent > > i'm going to check my ethernet network and make sure there's no > problems there (could this be an OOB error with mpirun?), on the nodes > that fail to start xhpl, i do see the orte process, but nothing in the > logs about why it failed to launch xhpl > > > > On Thu, Oct 11, 2012 at 11:49 AM, Michael Di Domenico > <mdidomeni...@gmail.com> wrote: >> I'm trying to diagnose an MPI job (in this case xhpl), that fails to >> start when the rank count gets fairly high into the thousands. >> >> My symptom is the jobs fires up via slurm, and I can see all the xhpl >> processes on the nodes, but it never kicks over to the next process. >> >> My question is, what debugs should I turn on to tell me what the >> system might be waiting on? >> >> I've checked a bunch of things, but I'm probably overlooking something >> trivial (which is par for me). >> >> I'm using the Openmpi 1.6.1, Slurm 2.4.2 on CentOS 6.3, with Infiniband/PSM
Re: [OMPI users] debugs for jobs not starting
too add a little more detail, it looks like xhpl is not actually starting on all nodes when i kick off the mpirun each time i cancel and restart the job, the nodes that do not start change, so i can't call it a bad node if i disable infiniband with --mca btl self,sm,tcp on occasion i can get xhpl to actually run, but it's not consistent i'm going to check my ethernet network and make sure there's no problems there (could this be an OOB error with mpirun?), on the nodes that fail to start xhpl, i do see the orte process, but nothing in the logs about why it failed to launch xhpl On Thu, Oct 11, 2012 at 11:49 AM, Michael Di Domenico <mdidomeni...@gmail.com> wrote: > I'm trying to diagnose an MPI job (in this case xhpl), that fails to > start when the rank count gets fairly high into the thousands. > > My symptom is the jobs fires up via slurm, and I can see all the xhpl > processes on the nodes, but it never kicks over to the next process. > > My question is, what debugs should I turn on to tell me what the > system might be waiting on? > > I've checked a bunch of things, but I'm probably overlooking something > trivial (which is par for me). > > I'm using the Openmpi 1.6.1, Slurm 2.4.2 on CentOS 6.3, with Infiniband/PSM
[OMPI users] debugs for jobs not starting
I'm trying to diagnose an MPI job (in this case xhpl), that fails to start when the rank count gets fairly high into the thousands. My symptom is the jobs fires up via slurm, and I can see all the xhpl processes on the nodes, but it never kicks over to the next process. My question is, what debugs should I turn on to tell me what the system might be waiting on? I've checked a bunch of things, but I'm probably overlooking something trivial (which is par for me). I'm using the Openmpi 1.6.1, Slurm 2.4.2 on CentOS 6.3, with Infiniband/PSM
Re: [OMPI users] srun and openmpi
Certainly, i reached out to several contacts I have inside qlogic (i used to work there)... On Fri, Apr 29, 2011 at 10:30 AM, Ralph Castainwrote: > Hi Michael > > I'm told that the Qlogic contacts we used to have are no longer there. Since > you obviously are a customer, can you ping them and ask (a) what that error > message means, and (b) what's wrong with the values I computed? > > You can also just send them my way, if that would help. We just need someone > to explain the requirements on that precondition value. > > Thanks > Ralph
Re: [OMPI users] srun and openmpi
On Fri, Apr 29, 2011 at 10:01 AM, Michael Di Domenico <mdidomeni...@gmail.com> wrote: > On Fri, Apr 29, 2011 at 4:52 AM, Ralph Castain <r...@open-mpi.org> wrote: >> Hi Michael >> >> Please see the attached updated patch to try for 1.5.3. I mistakenly free'd >> the envar after adding it to the environ :-/ > > The patch works great, i can now see the precondition environment > variable if i do > > mpirun -n 2 -host node1 > > and my runs just fine, However if i do > > srun --resv-ports -n 2 -w node1 > > I get > > [node1:16780] PSM EP connect error (unknown connect error): > [node1:16780] node1 > [node1:16780] PSM EP connect error (Endpoint could not be reached): > [node1:16780] node1 > > PML add procs failed > --> Returned "Error" (-1) instead of "Success" (0) > > I did notice a difference in the precondition env variable between the two > runs > > mpirun -n 2 -host node1 > > sets precondition_transports=fbc383997ee1b668-00d40f1401d2e827 (which > changes with each run (aka random)) > > srun --resv-ports -n 2 -w node1 this should have been "srun --resv-ports -n 1 -w node1 ", i can't run a 2 rank job, i get the PML error above > > sets precondition_transports=1845-0001 (which > doesn't seem to change run to run) >
Re: [OMPI users] srun and openmpi
On Fri, Apr 29, 2011 at 4:52 AM, Ralph Castainwrote: > Hi Michael > > Please see the attached updated patch to try for 1.5.3. I mistakenly free'd > the envar after adding it to the environ :-/ The patch works great, i can now see the precondition environment variable if i do mpirun -n 2 -host node1 and my runs just fine, However if i do srun --resv-ports -n 2 -w node1 I get [node1:16780] PSM EP connect error (unknown connect error): [node1:16780] node1 [node1:16780] PSM EP connect error (Endpoint could not be reached): [node1:16780] node1 PML add procs failed --> Returned "Error" (-1) instead of "Success" (0) I did notice a difference in the precondition env variable between the two runs mpirun -n 2 -host node1 sets precondition_transports=fbc383997ee1b668-00d40f1401d2e827 (which changes with each run (aka random)) srun --resv-ports -n 2 -w node1 sets precondition_transports=1845-0001 (which doesn't seem to change run to run)
Re: [OMPI users] srun and openmpi
On Thu, Apr 28, 2011 at 9:03 AM, Ralph Castain <r...@open-mpi.org> wrote: > > On Apr 28, 2011, at 6:49 AM, Michael Di Domenico wrote: > >> On Wed, Apr 27, 2011 at 11:47 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>> On Apr 27, 2011, at 1:06 PM, Michael Di Domenico wrote: >>> >>>> On Wed, Apr 27, 2011 at 2:46 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>> >>>>> On Apr 27, 2011, at 12:38 PM, Michael Di Domenico wrote: >>>>> >>>>>> On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>>>> >>>>>>> On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote: >>>>>>> >>>>>>>> Was this ever committed to the OMPI src as something not having to be >>>>>>>> run outside of OpenMPI, but as part of the PSM setup that OpenMPI >>>>>>>> does? >>>>>>> >>>>>>> Not that I know of - I don't think the PSM developers ever looked at it. >>> >>> Thought about this some more and I believe I have a soln to the problem. >>> Will try to commit something to the devel trunk by the end of the week. >> >> Thanks > > Just to save me looking back thru the thread - what OMPI version are you > using? If it isn't the trunk, I'll send you a patch you can use. I'm using OpenMPI v1.5.3 currently
Re: [OMPI users] srun and openmpi
On Wed, Apr 27, 2011 at 11:47 PM, Ralph Castain <r...@open-mpi.org> wrote: > > On Apr 27, 2011, at 1:06 PM, Michael Di Domenico wrote: > >> On Wed, Apr 27, 2011 at 2:46 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>> On Apr 27, 2011, at 12:38 PM, Michael Di Domenico wrote: >>> >>>> On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>> >>>>> On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote: >>>>> >>>>>> Was this ever committed to the OMPI src as something not having to be >>>>>> run outside of OpenMPI, but as part of the PSM setup that OpenMPI >>>>>> does? >>>>> >>>>> Not that I know of - I don't think the PSM developers ever looked at it. > > Thought about this some more and I believe I have a soln to the problem. Will > try to commit something to the devel trunk by the end of the week. Thanks
Re: [OMPI users] srun and openmpi
On Wed, Apr 27, 2011 at 2:46 PM, Ralph Castain <r...@open-mpi.org> wrote: > > On Apr 27, 2011, at 12:38 PM, Michael Di Domenico wrote: > >> On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>> On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote: >>> >>>> Was this ever committed to the OMPI src as something not having to be >>>> run outside of OpenMPI, but as part of the PSM setup that OpenMPI >>>> does? >>> >>> Not that I know of - I don't think the PSM developers ever looked at it. >>> >>>> >>>> I'm having some trouble getting Slurm/OpenMPI to play nice with the >>>> setup of this key. Namely, with slurm you cannot export variables >>>> from the --prolog of an srun, only from an --task-prolog, >>>> unfortunately, if you use a task-prolog each rank gets a different >>>> key, which doesn't work. >>>> >>>> I'm also guessing that each unique mpirun needs it's own psm key, not >>>> one for the whole system, so i can't just make it a permanent >>>> parameter somewhere else. >>>> >>>> Also, i recall reading somewhere that the --resv-ports parameter that >>>> OMPI uses from slurm to choose a list of ports to use for TCP comm's, >>>> tries to lock a port from the pool three times before giving up. >>> >>> Had to look back at the code - I think you misread this. I can find no >>> evidence in the code that we try to bind that port more than once. >> >> Perhaps i misstated, i don't believe you're trying to bind to the same >> port twice during the same session. i believe the code re-uses >> similar ports from session to session. what i believe happens (but >> could be totally wrong) the previous session releases the port, but >> linux isn't quite done with it when the new session tries to bind to >> the port, in which case it tries three times and then fails the job > > Actually, I understood you correctly. I'm just saying that I find no evidence > in the code that we try three times before giving up. What I see is a single > attempt to bind the port - if it fails, then we abort. There is no parameter > to control that behavior. > > So if the OS hasn't released the port by the time a new job starts on that > node, then it will indeed abort if the job was unfortunately given the same > port reservation. Oh, okay, sorry...
Re: [OMPI users] srun and openmpi
On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain <r...@open-mpi.org> wrote: > > On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote: > >> Was this ever committed to the OMPI src as something not having to be >> run outside of OpenMPI, but as part of the PSM setup that OpenMPI >> does? > > Not that I know of - I don't think the PSM developers ever looked at it. > >> >> I'm having some trouble getting Slurm/OpenMPI to play nice with the >> setup of this key. Namely, with slurm you cannot export variables >> from the --prolog of an srun, only from an --task-prolog, >> unfortunately, if you use a task-prolog each rank gets a different >> key, which doesn't work. >> >> I'm also guessing that each unique mpirun needs it's own psm key, not >> one for the whole system, so i can't just make it a permanent >> parameter somewhere else. >> >> Also, i recall reading somewhere that the --resv-ports parameter that >> OMPI uses from slurm to choose a list of ports to use for TCP comm's, >> tries to lock a port from the pool three times before giving up. > > Had to look back at the code - I think you misread this. I can find no > evidence in the code that we try to bind that port more than once. Perhaps i misstated, i don't believe you're trying to bind to the same port twice during the same session. i believe the code re-uses similar ports from session to session. what i believe happens (but could be totally wrong) the previous session releases the port, but linux isn't quite done with it when the new session tries to bind to the port, in which case it tries three times and then fails the job
Re: [OMPI users] srun and openmpi
Was this ever committed to the OMPI src as something not having to be run outside of OpenMPI, but as part of the PSM setup that OpenMPI does? I'm having some trouble getting Slurm/OpenMPI to play nice with the setup of this key. Namely, with slurm you cannot export variables from the --prolog of an srun, only from an --task-prolog, unfortunately, if you use a task-prolog each rank gets a different key, which doesn't work. I'm also guessing that each unique mpirun needs it's own psm key, not one for the whole system, so i can't just make it a permanent parameter somewhere else. Also, i recall reading somewhere that the --resv-ports parameter that OMPI uses from slurm to choose a list of ports to use for TCP comm's, tries to lock a port from the pool three times before giving up. Can someone tell me where that parameter is set, i'd like to set it to a higher value. We're seeing issues where running a large number of short srun's sequentially is causing some of the mpirun's in the stream to be killed because they could not lock the ports. I suspect because of the lag between when the port is actually closed in linux and when ompi re-opens a new port is very quick, we're trying three times and giving up. I have more then enough ports in the resv-ports list, 30k. but i suspect there is some random re-use being done and it's failing thanks On Mon, Jan 3, 2011 at 10:00 AM, Jeff Squyres <jsquy...@cisco.com> wrote: > Yo Ralph -- > > I see this was committed https://svn.open-mpi.org/trac/ompi/changeset/24197. > Do you want to add a blurb in README about it, and/or have this executable > compiled as part of the PSM MTL and then installed into $bindir (maybe named > ompi-psm-keygen)? > > Right now, it's only compiled as part of "make check" and not installed, > right? > > On Dec 30, 2010, at 5:07 PM, Ralph Castain wrote: > >> Run the program only once - it can be in the prolog of the job if you like. >> The output value needs to be in the env of every rank. >> >> You can reuse the value as many times as you like - it doesn't have to be >> unique for each job. There is nothing magic about the value itself. >> >> On Dec 30, 2010, at 2:11 PM, Michael Di Domenico wrote: >> >>> How early does this need to run? Can I run it as part of a task >>> prolog, or does it need to be the shell env for each rank? And does >>> it need to run on one node or all the nodes in the job? >>> >>> On Thu, Dec 30, 2010 at 8:54 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>> Well, I couldn't do it as a patch - proved too complicated as the psm >>>> system looks for the value early in the boot procedure. >>>> >>>> What I can do is give you the attached key generator program. It outputs >>>> the envar required to run your program. So if you run the attached program >>>> and then export the output into your environment, you should be okay. >>>> Looks like this: >>>> >>>> $ ./psm_keygen >>>> OMPI_MCA_orte_precondition_transports=0099b3eaa2c1547e-afb287789133a954 >>>> $ >>>> >>>> You compile the program with the usual mpicc. >>>> >>>> Let me know if this solves the problem (or not).
[OMPI users] Ofed v1.5.3?
Does OpenMPI v1.5.3 support Ofed v.1.5.3.1 ?
Re: [OMPI users] alltoall messages > 2^26
Here's a chunk of code that reproduces the error everytime on my cluster If you call it with $((2**24)) as a parameter it should run fine, change it to $((2**27)) and it will stall On Tue, Apr 5, 2011 at 11:24 AM, Terry Dontje <terry.don...@oracle.com>wrote: > It was asked during the community concall whether the below may be related > to ticket #2722 https://svn.open-mpi.org/trac/ompi/ticket/2722? > > --td > > On 04/04/2011 10:17 PM, David Zhang wrote: > > Any error messages? Maybe the nodes ran out of memory? I know MPI > implement some kind of buffering under the hood, so even though you're > sending array's over 2^26 in size, it may require more than that for MPI to > actually send it. > > On Mon, Apr 4, 2011 at 2:16 PM, Michael Di Domenico < > mdidomeni...@gmail.com> wrote: > >> Has anyone seen an issue where OpenMPI/Infiniband hangs when sending >> messages over 2^26 in size? >> >> For a reason i have not determined just yet machines on my cluster >> (OpenMPI v1.5 and Qlogic Stack/QDR IB Adapters) is failing to send >> array's over 2^26 in size via the AllToAll collective. (user code) >> >> Further testing seems to indicate that an MPI message over 2^26 fails >> (tested with IMB-MPI) >> >> Running the same test on a different older IB connected cluster seems >> to work, which would seem to indicate a problem with the infiniband >> drivers of some sort rather then openmpi (but i'm not sure). >> >> Any thoughts, directions, or tests? >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > > -- > David Zhang > University of California, San Diego > > > ___ > users mailing > listusers@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/users > > > > -- > [image: Oracle] > Terry D. Dontje | Principal Software Engineer > Developer Tools Engineering | +1.781.442.2631 > Oracle * - Performance Technologies* > 95 Network Drive, Burlington, MA 01803 > Email terry.don...@oracle.com > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > #include #include #include #include #include typedef signed char int8; typedef unsigned char uint8; typedef short int16; typedef unsigned short uint16; typedef int int32; typedef unsigned int uint32; typedef long int64; typedef unsigned long uint64; #define I64(c) (c##L) #define UI64(c) (c##uL) #define _BR_RUNUP_ 128 #define _BR_LG_TABSZ_ 7 #define _BR_TABSZ_ (I64(1) << _BR_LG_TABSZ_) #define _ZERO64 UI64(0x0) #define _maskl(x) (((x) == 0) ? _ZERO64 : ((~_ZERO64) << (64-(x #define _maskr(x) (((x) == 0) ? _ZERO64 : ((~_ZERO64) >> (64-(x #define _BR_64STEP_(H,L,A,B) {\ uint64 x;\ x = H ^ (H << A) ^ (L >> (64 - A));\ H = L | (x >> (B - 64));\ L = x << (128 - B);\ } static uint64_t _rtc() { unsigned hi, lo, tmp; asm volatile ("rdtsc" : "=a" (lo), "=d" (hi)); return (uint64_t)hi << 32 | lo; } typedef struct { uint64 hi, lo, ind; uint64 tab[_BR_TABSZ_]; } brand_t; static uint64 brand (brand_t *p) { uint64 hi=p->hi, lo=p->lo, i=p->ind, ret; ret = p->tab[i]; // 64-step a primitive trinomial LRS: 0, 45, 118 _BR_64STEP_(hi,lo,45,118); p->tab[i] = ret + hi; p->hi = hi; p->lo = lo; p->ind = hi & _maskr(_BR_LG_TABSZ_); return ret; } static void brand_init (brand_t *p, uint64 val) { int64 i; uint64 hi, lo; hi = UI64(0x9ccae22ed2c6e578) ^ val; lo = UI64(0xce4db5d70739bd22) & _maskl(118-64); // we 64-step 0, 33, 118 in the initialization for (i = 0; i < 64; i++) _BR_64STEP_(hi,lo,33,118); for (i = 0; i < _BR_TABSZ_; i++) { _BR_64STEP_(hi,lo,33,118); p->tab[i] = hi; } p->ind = _BR_TABSZ_/2; p->hi = hi; p->lo = lo; for (i = 0; i < _BR_RUNUP_; i++) brand(p); } void rubbish(brand_t* state, uint64_t n_words, uint64_t array[]) { uint64_t i; for (i = 0; i < n_words; i++) array[i] = brand(state); } void usage(const char* prog) { int me; MPI_Comm_rank(MPI_COMM_WORLD, ); if (me == 0) fprintf(stderr, "usage: %s #bytes/process\n", prog); exit(2); } int main(int argc, char* argv[]) { brand_t state; int i_proc, n_procs, words_per_chunk, loop; size_t array_size; uint64_t* source; uint64_t* dest; MPI_Init(, ); MPI_Comm_size(MPI_COMM_WORLD, _procs); MPI_Comm_rank(MPI_COMM_WORLD, _proc); MPI_Datatype uint64_type = MPI_LONG; MPI_Aint foo = 0; MPI_Type_extent(uint64_type, ); assert(foo == 8); if (argc < 2) usage(argv[0]); arra
Re: [OMPI users] alltoall messages > 2^26
There are no messages being spit out, but i'm not sure i have all the correct debugs turn on. I turned on -debug-devel -debug-daemons and mca_verbose. but it appears that the process just hangs. If it's memory exhaustion its not from the core memory these nodes have 48GB of memory, it could be a buffer somewhere, but i'm not sure where On Mon, Apr 4, 2011 at 10:17 PM, David Zhang <solarbik...@gmail.com> wrote: > Any error messages? Maybe the nodes ran out of memory? I know MPI > implement some kind of buffering under the hood, so even though you're > sending array's over 2^26 in size, it may require more than that for MPI to > actually send it. > > On Mon, Apr 4, 2011 at 2:16 PM, Michael Di Domenico <mdidomeni...@gmail.com> > wrote: >> >> Has anyone seen an issue where OpenMPI/Infiniband hangs when sending >> messages over 2^26 in size? >> >> For a reason i have not determined just yet machines on my cluster >> (OpenMPI v1.5 and Qlogic Stack/QDR IB Adapters) is failing to send >> array's over 2^26 in size via the AllToAll collective. (user code) >> >> Further testing seems to indicate that an MPI message over 2^26 fails >> (tested with IMB-MPI) >> >> Running the same test on a different older IB connected cluster seems >> to work, which would seem to indicate a problem with the infiniband >> drivers of some sort rather then openmpi (but i'm not sure). >> >> Any thoughts, directions, or tests? >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > -- > David Zhang > University of California, San Diego > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] srun and openmpi
Yes, i am setting the config correcty. Our IB machines seem to run just fine so far using srun and openmpi v1.5. As another data point, we enabled mpi-threads in Openmpi and that also seems to trigger the Srun/TCP behavior, but on the IB fabric. Running the program within an salloc rather the straight srun and the problem seems to go away On Tue, Jan 25, 2011 at 2:59 PM, Nathan Hjelm <hje...@lanl.gov> wrote: > We are seeing the similar problem with our infiniband machines. After some > investigation I discovered that we were not setting our slurm environment > correctly (ref: > https://computing.llnl.gov/linux/slurm/mpi_guide.html#open_mpi). Are you > setting the ports in your slurm.conf and executing srun with --resv-ports? > > I have yet to see if this fixes the problem for LANL. Waiting on a sysadmin > to modify the slurm.conf. > > -Nathan > HPC-3, LANL > > On Tue, 25 Jan 2011, Michael Di Domenico wrote: > >> Thanks. We're only seeing it on machines with Ethernet only as the >> interconnect. fortunately for us that only equates to one small >> machine, but it's still annoying. unfortunately, i don't have enough >> knowledge to dive into the code to help fix, but i can certainly help >> test >> >> On Mon, Jan 24, 2011 at 1:41 PM, Nathan Hjelm <hje...@lanl.gov> wrote: >>> >>> I am seeing similar issues on our slurm clusters. We are looking into the >>> issue. >>> >>> -Nathan >>> HPC-3, LANL >>> >>> On Tue, 11 Jan 2011, Michael Di Domenico wrote: >>> >>>> Any ideas on what might be causing this one? Or atleast what >>>> additional debug information someone might need? >>>> >>>> On Fri, Jan 7, 2011 at 4:03 PM, Michael Di Domenico >>>> <mdidomeni...@gmail.com> wrote: >>>>> >>>>> I'm still testing the slurm integration, which seems to work fine so >>>>> far. However, i just upgraded another cluster to openmpi-1.5 and >>>>> slurm 2.1.15 but this machine has no infiniband >>>>> >>>>> if i salloc the nodes and mpirun the command it seems to run and >>>>> complete >>>>> fine >>>>> however if i srun the command i get >>>>> >>>>> [btl_tcp_endpoint:486] mca_btl_tcp_endpoint_recv_connect_ack received >>>>> unexpected prcoess identifier >>>>> >>>>> the job does not seem to run, but exhibits two behaviors >>>>> running a single process per node the job runs and does not present >>>>> the error (srun -N40 --ntasks-per-node=1) >>>>> running multiple processes per node, the job spits out the error but >>>>> does not run (srun -n40 --ntasks-per-node=8) >>>>> >>>>> I copied the configs from the other machine, so (i think) everything >>>>> should be configured correctly (but i can't rule it out) >>>>> >>>>> I saw (and reported) a similar error to above with the 1.4-dev branch >>>>> (see mailing list) and slurm, I can't say whether they're related or >>>>> not though >>>>> >>>>> >>>>> On Mon, Jan 3, 2011 at 3:00 PM, Jeff Squyres <jsquy...@cisco.com> >>>>> wrote: >>>>>> >>>>>> Yo Ralph -- >>>>>> >>>>>> I see this was committed >>>>>> https://svn.open-mpi.org/trac/ompi/changeset/24197. Do you want to >>>>>> add a >>>>>> blurb in README about it, and/or have this executable compiled as part >>>>>> of >>>>>> the PSM MTL and then installed into $bindir (maybe named >>>>>> ompi-psm-keygen)? >>>>>> >>>>>> Right now, it's only compiled as part of "make check" and not >>>>>> installed, >>>>>> right? >>>>>> >>>>>> >>>>>> >>>>>> On Dec 30, 2010, at 5:07 PM, Ralph Castain wrote: >>>>>> >>>>>>> Run the program only once - it can be in the prolog of the job if you >>>>>>> like. The output value needs to be in the env of every rank. >>>>>>> >>>>>>> You can reuse the value as many times as you like - it doesn't have >>>>>>> to >>>>>>> be unique for each job. There is nothing magic about the value >>>>>>> itself. >>>>>>> >>>>>>>
Re: [OMPI users] srun and openmpi
Thanks. We're only seeing it on machines with Ethernet only as the interconnect. fortunately for us that only equates to one small machine, but it's still annoying. unfortunately, i don't have enough knowledge to dive into the code to help fix, but i can certainly help test On Mon, Jan 24, 2011 at 1:41 PM, Nathan Hjelm <hje...@lanl.gov> wrote: > I am seeing similar issues on our slurm clusters. We are looking into the > issue. > > -Nathan > HPC-3, LANL > > On Tue, 11 Jan 2011, Michael Di Domenico wrote: > >> Any ideas on what might be causing this one? Or atleast what >> additional debug information someone might need? >> >> On Fri, Jan 7, 2011 at 4:03 PM, Michael Di Domenico >> <mdidomeni...@gmail.com> wrote: >>> >>> I'm still testing the slurm integration, which seems to work fine so >>> far. However, i just upgraded another cluster to openmpi-1.5 and >>> slurm 2.1.15 but this machine has no infiniband >>> >>> if i salloc the nodes and mpirun the command it seems to run and complete >>> fine >>> however if i srun the command i get >>> >>> [btl_tcp_endpoint:486] mca_btl_tcp_endpoint_recv_connect_ack received >>> unexpected prcoess identifier >>> >>> the job does not seem to run, but exhibits two behaviors >>> running a single process per node the job runs and does not present >>> the error (srun -N40 --ntasks-per-node=1) >>> running multiple processes per node, the job spits out the error but >>> does not run (srun -n40 --ntasks-per-node=8) >>> >>> I copied the configs from the other machine, so (i think) everything >>> should be configured correctly (but i can't rule it out) >>> >>> I saw (and reported) a similar error to above with the 1.4-dev branch >>> (see mailing list) and slurm, I can't say whether they're related or >>> not though >>> >>> >>> On Mon, Jan 3, 2011 at 3:00 PM, Jeff Squyres <jsquy...@cisco.com> wrote: >>>> >>>> Yo Ralph -- >>>> >>>> I see this was committed >>>> https://svn.open-mpi.org/trac/ompi/changeset/24197. Do you want to add a >>>> blurb in README about it, and/or have this executable compiled as part of >>>> the PSM MTL and then installed into $bindir (maybe named ompi-psm-keygen)? >>>> >>>> Right now, it's only compiled as part of "make check" and not installed, >>>> right? >>>> >>>> >>>> >>>> On Dec 30, 2010, at 5:07 PM, Ralph Castain wrote: >>>> >>>>> Run the program only once - it can be in the prolog of the job if you >>>>> like. The output value needs to be in the env of every rank. >>>>> >>>>> You can reuse the value as many times as you like - it doesn't have to >>>>> be unique for each job. There is nothing magic about the value itself. >>>>> >>>>> On Dec 30, 2010, at 2:11 PM, Michael Di Domenico wrote: >>>>> >>>>>> How early does this need to run? Can I run it as part of a task >>>>>> prolog, or does it need to be the shell env for each rank? And does >>>>>> it need to run on one node or all the nodes in the job? >>>>>> >>>>>> On Thu, Dec 30, 2010 at 8:54 PM, Ralph Castain <r...@open-mpi.org> >>>>>> wrote: >>>>>>> >>>>>>> Well, I couldn't do it as a patch - proved too complicated as the psm >>>>>>> system looks for the value early in the boot procedure. >>>>>>> >>>>>>> What I can do is give you the attached key generator program. It >>>>>>> outputs the envar required to run your program. So if you run the >>>>>>> attached >>>>>>> program and then export the output into your environment, you should be >>>>>>> okay. Looks like this: >>>>>>> >>>>>>> $ ./psm_keygen >>>>>>> >>>>>>> OMPI_MCA_orte_precondition_transports=0099b3eaa2c1547e-afb287789133a954 >>>>>>> $ >>>>>>> >>>>>>> You compile the program with the usual mpicc. >>>>>>> >>>>>>> Let me know if this solves the problem (or not). >>>>>>> Ralph >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On
Re: [OMPI users] srun and openmpi
Any ideas on what might be causing this one? Or atleast what additional debug information someone might need? On Fri, Jan 7, 2011 at 4:03 PM, Michael Di Domenico <mdidomeni...@gmail.com> wrote: > I'm still testing the slurm integration, which seems to work fine so > far. However, i just upgraded another cluster to openmpi-1.5 and > slurm 2.1.15 but this machine has no infiniband > > if i salloc the nodes and mpirun the command it seems to run and complete fine > however if i srun the command i get > > [btl_tcp_endpoint:486] mca_btl_tcp_endpoint_recv_connect_ack received > unexpected prcoess identifier > > the job does not seem to run, but exhibits two behaviors > running a single process per node the job runs and does not present > the error (srun -N40 --ntasks-per-node=1) > running multiple processes per node, the job spits out the error but > does not run (srun -n40 --ntasks-per-node=8) > > I copied the configs from the other machine, so (i think) everything > should be configured correctly (but i can't rule it out) > > I saw (and reported) a similar error to above with the 1.4-dev branch > (see mailing list) and slurm, I can't say whether they're related or > not though > > > On Mon, Jan 3, 2011 at 3:00 PM, Jeff Squyres <jsquy...@cisco.com> wrote: >> Yo Ralph -- >> >> I see this was committed https://svn.open-mpi.org/trac/ompi/changeset/24197. >> Do you want to add a blurb in README about it, and/or have this executable >> compiled as part of the PSM MTL and then installed into $bindir (maybe named >> ompi-psm-keygen)? >> >> Right now, it's only compiled as part of "make check" and not installed, >> right? >> >> >> >> On Dec 30, 2010, at 5:07 PM, Ralph Castain wrote: >> >>> Run the program only once - it can be in the prolog of the job if you like. >>> The output value needs to be in the env of every rank. >>> >>> You can reuse the value as many times as you like - it doesn't have to be >>> unique for each job. There is nothing magic about the value itself. >>> >>> On Dec 30, 2010, at 2:11 PM, Michael Di Domenico wrote: >>> >>>> How early does this need to run? Can I run it as part of a task >>>> prolog, or does it need to be the shell env for each rank? And does >>>> it need to run on one node or all the nodes in the job? >>>> >>>> On Thu, Dec 30, 2010 at 8:54 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>> Well, I couldn't do it as a patch - proved too complicated as the psm >>>>> system looks for the value early in the boot procedure. >>>>> >>>>> What I can do is give you the attached key generator program. It outputs >>>>> the envar required to run your program. So if you run the attached >>>>> program and then export the output into your environment, you should be >>>>> okay. Looks like this: >>>>> >>>>> $ ./psm_keygen >>>>> OMPI_MCA_orte_precondition_transports=0099b3eaa2c1547e-afb287789133a954 >>>>> $ >>>>> >>>>> You compile the program with the usual mpicc. >>>>> >>>>> Let me know if this solves the problem (or not). >>>>> Ralph >>>>> >>>>> >>>>> >>>>> >>>>> On Dec 30, 2010, at 11:18 AM, Michael Di Domenico wrote: >>>>> >>>>>> Sure, i'll give it a go >>>>>> >>>>>> On Thu, Dec 30, 2010 at 5:53 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>>>> Ah, yes - that is going to be a problem. The PSM key gets generated by >>>>>>> mpirun as it is shared info - i.e., every proc has to get the same >>>>>>> value. >>>>>>> >>>>>>> I can create a patch that will do this for the srun direct-launch >>>>>>> scenario, if you want to try it. Would be later today, though. >>>>>>> >>>>>>> >>>>>>> On Dec 30, 2010, at 10:31 AM, Michael Di Domenico wrote: >>>>>>> >>>>>>>> Well maybe not horray, yet. I might have jumped the gun a bit, it's >>>>>>>> looking like srun works in general, but perhaps not with PSM >>>>>>>> >>>>>>>> With PSM i get this error, (at least now i know what i changed) >>>>>>>> >>>>>>>> Error obtaining unique transport key from ORTE &g
Re: [OMPI users] CQ errors
2011/1/10 Peter Kjellström <c...@nsc.liu.se>: > On Monday, January 10, 2011 03:06:06 pm Michael Di Domenico wrote: >> I'm not sure if these are being reported from OpenMPI or through >> OpenMPI from OpenFabrics, but i figured this would be a good place to >> start >> >> On one node we received the below errors, i'm not sure i under the >> error sequence, hopefully someone can shed some light on what >> happened. >> >> [[5691,1],49][btl_openib_component.c:3294:handle_wc] from node27 to: > ... >> network is qlogic qdr end to end, openmpi 1.5 and ofed 1.5.2 (q stack) > > Not really addressing your problem, but, with qlogic you should be using psm, > not verbs (btl_openib). > > That said, openib should work (slowly). Yes, you are correct. We're running via verbs at the moment because of a slurm interop issue. I have a patch from ralph but have not tested it yet. So far the only noticeable to effect to running non-psm is a 5usec hit on each packet. otherwise functionally we seem okay.
[OMPI users] CQ errors
I'm not sure if these are being reported from OpenMPI or through OpenMPI from OpenFabrics, but i figured this would be a good place to start On one node we received the below errors, i'm not sure i under the error sequence, hopefully someone can shed some light on what happened. [[5691,1],49][btl_openib_component.c:3294:handle_wc] from node27 to: node28 error polling HP CQ with status WORK_REQUEST FLUSHED ERROR status number 5 for wr_id c30b100 opcode 128 vendor error 0 qp_idx 0 [[5691,1],49][btl_openib_component.c:3294:handle_wc] from node26 to: node28 error polling LP CQ with status RETRY EXCEEDED ERROR status number 12 for wr_id 1755c900 opcode 1 vendor error 0 qp_idx 0 [[5691,1],49][btl_openib_component.c:3294:handle_wc] from (null) to: node28 error polling HP CQ with status WORK_REQUEST FLUSHED ERROR status number 5 for wr_id 1779b180 opcode 128 vendor error 0 qp_idx 0 [[5691,1],49][btl_openib_component.c:3294:handle_wc] from node20 to: node28 error polling HP CQ with status WORK_REQUEST FLUSHED ERROR status number 5 for wr_id 8e1aa80 opcode 128 vendor error 0 qp_idx 0 [[5691,1],49][btl_openib_component.c:3294:handle_wc] from node24 to: node28 error polling LP CQ with status RETRY EXCEEDED ERROR status number 12 for wr_id 1164b600 opcode 1 vendor error 0 qp_idx 0 [[5691,1],49][btl_openib_component.c:3294:handle_wc] from (null) to: node28 error polling HP CQ with status WORK_REQUEST FLUSHED ERROR status number 5 for wr_id 118c3f80 opcode 128 vendor error 0 qp_idx 0 [[5691,1],49][btl_openib_component.c:3294:handle_wc] from node12 to: node28 error polling HP CQ with status WORK_REQUEST FLUSHED ERROR status number 5 for wr_id 1b8f0080 opcode 128 vendor error 0 qp_idx 0 It was the only node out of a 75 node run that spit out the error. I rechecked the node, no symbol/link recovery errors on the network and ran Pallas between it and several other machines with no errors network is qlogic qdr end to end, openmpi 1.5 and ofed 1.5.2 (q stack) thanks
Re: [OMPI users] srun and openmpi
I'm still testing the slurm integration, which seems to work fine so far. However, i just upgraded another cluster to openmpi-1.5 and slurm 2.1.15 but this machine has no infiniband if i salloc the nodes and mpirun the command it seems to run and complete fine however if i srun the command i get [btl_tcp_endpoint:486] mca_btl_tcp_endpoint_recv_connect_ack received unexpected prcoess identifier the job does not seem to run, but exhibits two behaviors running a single process per node the job runs and does not present the error (srun -N40 --ntasks-per-node=1) running multiple processes per node, the job spits out the error but does not run (srun -n40 --ntasks-per-node=8) I copied the configs from the other machine, so (i think) everything should be configured correctly (but i can't rule it out) I saw (and reported) a similar error to above with the 1.4-dev branch (see mailing list) and slurm, I can't say whether they're related or not though On Mon, Jan 3, 2011 at 3:00 PM, Jeff Squyres <jsquy...@cisco.com> wrote: > Yo Ralph -- > > I see this was committed https://svn.open-mpi.org/trac/ompi/changeset/24197. > Do you want to add a blurb in README about it, and/or have this executable > compiled as part of the PSM MTL and then installed into $bindir (maybe named > ompi-psm-keygen)? > > Right now, it's only compiled as part of "make check" and not installed, > right? > > > > On Dec 30, 2010, at 5:07 PM, Ralph Castain wrote: > >> Run the program only once - it can be in the prolog of the job if you like. >> The output value needs to be in the env of every rank. >> >> You can reuse the value as many times as you like - it doesn't have to be >> unique for each job. There is nothing magic about the value itself. >> >> On Dec 30, 2010, at 2:11 PM, Michael Di Domenico wrote: >> >>> How early does this need to run? Can I run it as part of a task >>> prolog, or does it need to be the shell env for each rank? And does >>> it need to run on one node or all the nodes in the job? >>> >>> On Thu, Dec 30, 2010 at 8:54 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>> Well, I couldn't do it as a patch - proved too complicated as the psm >>>> system looks for the value early in the boot procedure. >>>> >>>> What I can do is give you the attached key generator program. It outputs >>>> the envar required to run your program. So if you run the attached program >>>> and then export the output into your environment, you should be okay. >>>> Looks like this: >>>> >>>> $ ./psm_keygen >>>> OMPI_MCA_orte_precondition_transports=0099b3eaa2c1547e-afb287789133a954 >>>> $ >>>> >>>> You compile the program with the usual mpicc. >>>> >>>> Let me know if this solves the problem (or not). >>>> Ralph >>>> >>>> >>>> >>>> >>>> On Dec 30, 2010, at 11:18 AM, Michael Di Domenico wrote: >>>> >>>>> Sure, i'll give it a go >>>>> >>>>> On Thu, Dec 30, 2010 at 5:53 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>>> Ah, yes - that is going to be a problem. The PSM key gets generated by >>>>>> mpirun as it is shared info - i.e., every proc has to get the same value. >>>>>> >>>>>> I can create a patch that will do this for the srun direct-launch >>>>>> scenario, if you want to try it. Would be later today, though. >>>>>> >>>>>> >>>>>> On Dec 30, 2010, at 10:31 AM, Michael Di Domenico wrote: >>>>>> >>>>>>> Well maybe not horray, yet. I might have jumped the gun a bit, it's >>>>>>> looking like srun works in general, but perhaps not with PSM >>>>>>> >>>>>>> With PSM i get this error, (at least now i know what i changed) >>>>>>> >>>>>>> Error obtaining unique transport key from ORTE >>>>>>> (orte_precondition_transports not present in the environment) >>>>>>> PML add procs failed >>>>>>> --> Returned "Error" (-1) instead of "Success" (0) >>>>>>> >>>>>>> Turn off PSM and srun works fine >>>>>>> >>>>>>> >>>>>>> On Thu, Dec 30, 2010 at 5:13 PM, Ralph Castain <r...@open-mpi.org> >>>>>>> wrote: >>>>>>>> Hooray! >>>>>>>> >&g
Re: [OMPI users] srun and openmpi
How early does this need to run? Can I run it as part of a task prolog, or does it need to be the shell env for each rank? And does it need to run on one node or all the nodes in the job? On Thu, Dec 30, 2010 at 8:54 PM, Ralph Castain <r...@open-mpi.org> wrote: > Well, I couldn't do it as a patch - proved too complicated as the psm system > looks for the value early in the boot procedure. > > What I can do is give you the attached key generator program. It outputs the > envar required to run your program. So if you run the attached program and > then export the output into your environment, you should be okay. Looks like > this: > > $ ./psm_keygen > OMPI_MCA_orte_precondition_transports=0099b3eaa2c1547e-afb287789133a954 > $ > > You compile the program with the usual mpicc. > > Let me know if this solves the problem (or not). > Ralph > > > > > On Dec 30, 2010, at 11:18 AM, Michael Di Domenico wrote: > >> Sure, i'll give it a go >> >> On Thu, Dec 30, 2010 at 5:53 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> Ah, yes - that is going to be a problem. The PSM key gets generated by >>> mpirun as it is shared info - i.e., every proc has to get the same value. >>> >>> I can create a patch that will do this for the srun direct-launch scenario, >>> if you want to try it. Would be later today, though. >>> >>> >>> On Dec 30, 2010, at 10:31 AM, Michael Di Domenico wrote: >>> >>>> Well maybe not horray, yet. I might have jumped the gun a bit, it's >>>> looking like srun works in general, but perhaps not with PSM >>>> >>>> With PSM i get this error, (at least now i know what i changed) >>>> >>>> Error obtaining unique transport key from ORTE >>>> (orte_precondition_transports not present in the environment) >>>> PML add procs failed >>>> --> Returned "Error" (-1) instead of "Success" (0) >>>> >>>> Turn off PSM and srun works fine >>>> >>>> >>>> On Thu, Dec 30, 2010 at 5:13 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>> Hooray! >>>>> >>>>> On Dec 30, 2010, at 9:57 AM, Michael Di Domenico wrote: >>>>> >>>>>> I think i take it all back. I just tried it again and it seems to >>>>>> work now. I'm not sure what I changed (between my first and this >>>>>> msg), but it does appear to work now. >>>>>> >>>>>> On Thu, Dec 30, 2010 at 4:31 PM, Michael Di Domenico >>>>>> <mdidomeni...@gmail.com> wrote: >>>>>>> Yes that's true, error messages help. I was hoping there was some >>>>>>> documentation to see what i've done wrong. I can't easily cut and >>>>>>> paste errors from my cluster. >>>>>>> >>>>>>> Here's a snippet (hand typed) of the error message, but it does look >>>>>>> like a rank communications error >>>>>>> >>>>>>> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose >>>>>>> contact information is unknown in file rml_oob_send.c at line 145. >>>>>>> *** MPI_INIT failure message (snipped) *** >>>>>>> orte_grpcomm_modex failed >>>>>>> --> Returned "A messages is attempting to be sent to a process whose >>>>>>> contact information us uknown" (-117) instead of "Success" (0) >>>>>>> >>>>>>> This msg repeats for each rank, an ultimately hangs the srun which i >>>>>>> have to Ctrl-C and terminate >>>>>>> >>>>>>> I have mpiports defined in my slurm config and running srun with >>>>>>> -resv-ports does show the SLURM_RESV_PORTS environment variable >>>>>>> getting parts to the shell >>>>>>> >>>>>>> >>>>>>> On Thu, Dec 23, 2010 at 8:09 PM, Ralph Castain <r...@open-mpi.org> >>>>>>> wrote: >>>>>>>> I'm not sure there is any documentation yet - not much clamor for it. >>>>>>>> :-/ >>>>>>>> >>>>>>>> It would really help if you included the error message. Otherwise, all >>>>>>>> I can do is guess, which wastes both of our time :-( >>>>>>>> >>>&g
Re: [OMPI users] srun and openmpi
Sure, i'll give it a go On Thu, Dec 30, 2010 at 5:53 PM, Ralph Castain <r...@open-mpi.org> wrote: > Ah, yes - that is going to be a problem. The PSM key gets generated by mpirun > as it is shared info - i.e., every proc has to get the same value. > > I can create a patch that will do this for the srun direct-launch scenario, > if you want to try it. Would be later today, though. > > > On Dec 30, 2010, at 10:31 AM, Michael Di Domenico wrote: > >> Well maybe not horray, yet. I might have jumped the gun a bit, it's >> looking like srun works in general, but perhaps not with PSM >> >> With PSM i get this error, (at least now i know what i changed) >> >> Error obtaining unique transport key from ORTE >> (orte_precondition_transports not present in the environment) >> PML add procs failed >> --> Returned "Error" (-1) instead of "Success" (0) >> >> Turn off PSM and srun works fine >> >> >> On Thu, Dec 30, 2010 at 5:13 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> Hooray! >>> >>> On Dec 30, 2010, at 9:57 AM, Michael Di Domenico wrote: >>> >>>> I think i take it all back. I just tried it again and it seems to >>>> work now. I'm not sure what I changed (between my first and this >>>> msg), but it does appear to work now. >>>> >>>> On Thu, Dec 30, 2010 at 4:31 PM, Michael Di Domenico >>>> <mdidomeni...@gmail.com> wrote: >>>>> Yes that's true, error messages help. I was hoping there was some >>>>> documentation to see what i've done wrong. I can't easily cut and >>>>> paste errors from my cluster. >>>>> >>>>> Here's a snippet (hand typed) of the error message, but it does look >>>>> like a rank communications error >>>>> >>>>> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose >>>>> contact information is unknown in file rml_oob_send.c at line 145. >>>>> *** MPI_INIT failure message (snipped) *** >>>>> orte_grpcomm_modex failed >>>>> --> Returned "A messages is attempting to be sent to a process whose >>>>> contact information us uknown" (-117) instead of "Success" (0) >>>>> >>>>> This msg repeats for each rank, an ultimately hangs the srun which i >>>>> have to Ctrl-C and terminate >>>>> >>>>> I have mpiports defined in my slurm config and running srun with >>>>> -resv-ports does show the SLURM_RESV_PORTS environment variable >>>>> getting parts to the shell >>>>> >>>>> >>>>> On Thu, Dec 23, 2010 at 8:09 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>>> I'm not sure there is any documentation yet - not much clamor for it. :-/ >>>>>> >>>>>> It would really help if you included the error message. Otherwise, all I >>>>>> can do is guess, which wastes both of our time :-( >>>>>> >>>>>> My best guess is that the port reservation didn't get passed down to the >>>>>> MPI procs properly - but that's just a guess. >>>>>> >>>>>> >>>>>> On Dec 23, 2010, at 12:46 PM, Michael Di Domenico wrote: >>>>>> >>>>>>> Can anyone point me towards the most recent documentation for using >>>>>>> srun and openmpi? >>>>>>> >>>>>>> I followed what i found on the web with enabling the MpiPorts config >>>>>>> in slurm and using the --resv-ports switch, but I'm getting an error >>>>>>> from openmpi during setup. >>>>>>> >>>>>>> I'm using Slurm 2.1.15 and Openmpi 1.5 w/PSM >>>>>>> >>>>>>> I'm sure I'm missing a step. >>>>>>> >>>>>>> Thanks >>>>>>> ___ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> >>>>>> ___ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>> >>>> >>>> ___ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] srun and openmpi
Well maybe not horray, yet. I might have jumped the gun a bit, it's looking like srun works in general, but perhaps not with PSM With PSM i get this error, (at least now i know what i changed) Error obtaining unique transport key from ORTE (orte_precondition_transports not present in the environment) PML add procs failed --> Returned "Error" (-1) instead of "Success" (0) Turn off PSM and srun works fine On Thu, Dec 30, 2010 at 5:13 PM, Ralph Castain <r...@open-mpi.org> wrote: > Hooray! > > On Dec 30, 2010, at 9:57 AM, Michael Di Domenico wrote: > >> I think i take it all back. I just tried it again and it seems to >> work now. I'm not sure what I changed (between my first and this >> msg), but it does appear to work now. >> >> On Thu, Dec 30, 2010 at 4:31 PM, Michael Di Domenico >> <mdidomeni...@gmail.com> wrote: >>> Yes that's true, error messages help. I was hoping there was some >>> documentation to see what i've done wrong. I can't easily cut and >>> paste errors from my cluster. >>> >>> Here's a snippet (hand typed) of the error message, but it does look >>> like a rank communications error >>> >>> ORTE_ERROR_LOG: A message is attempting to be sent to a process whose >>> contact information is unknown in file rml_oob_send.c at line 145. >>> *** MPI_INIT failure message (snipped) *** >>> orte_grpcomm_modex failed >>> --> Returned "A messages is attempting to be sent to a process whose >>> contact information us uknown" (-117) instead of "Success" (0) >>> >>> This msg repeats for each rank, an ultimately hangs the srun which i >>> have to Ctrl-C and terminate >>> >>> I have mpiports defined in my slurm config and running srun with >>> -resv-ports does show the SLURM_RESV_PORTS environment variable >>> getting parts to the shell >>> >>> >>> On Thu, Dec 23, 2010 at 8:09 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>> I'm not sure there is any documentation yet - not much clamor for it. :-/ >>>> >>>> It would really help if you included the error message. Otherwise, all I >>>> can do is guess, which wastes both of our time :-( >>>> >>>> My best guess is that the port reservation didn't get passed down to the >>>> MPI procs properly - but that's just a guess. >>>> >>>> >>>> On Dec 23, 2010, at 12:46 PM, Michael Di Domenico wrote: >>>> >>>>> Can anyone point me towards the most recent documentation for using >>>>> srun and openmpi? >>>>> >>>>> I followed what i found on the web with enabling the MpiPorts config >>>>> in slurm and using the --resv-ports switch, but I'm getting an error >>>>> from openmpi during setup. >>>>> >>>>> I'm using Slurm 2.1.15 and Openmpi 1.5 w/PSM >>>>> >>>>> I'm sure I'm missing a step. >>>>> >>>>> Thanks >>>>> ___ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> ___ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] srun and openmpi
Yes that's true, error messages help. I was hoping there was some documentation to see what i've done wrong. I can't easily cut and paste errors from my cluster. Here's a snippet (hand typed) of the error message, but it does look like a rank communications error ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 145. *** MPI_INIT failure message (snipped) *** orte_grpcomm_modex failed --> Returned "A messages is attempting to be sent to a process whose contact information us uknown" (-117) instead of "Success" (0) This msg repeats for each rank, an ultimately hangs the srun which i have to Ctrl-C and terminate I have mpiports defined in my slurm config and running srun with -resv-ports does show the SLURM_RESV_PORTS environment variable getting parts to the shell On Thu, Dec 23, 2010 at 8:09 PM, Ralph Castain <r...@open-mpi.org> wrote: > I'm not sure there is any documentation yet - not much clamor for it. :-/ > > It would really help if you included the error message. Otherwise, all I can > do is guess, which wastes both of our time :-( > > My best guess is that the port reservation didn't get passed down to the MPI > procs properly - but that's just a guess. > > > On Dec 23, 2010, at 12:46 PM, Michael Di Domenico wrote: > >> Can anyone point me towards the most recent documentation for using >> srun and openmpi? >> >> I followed what i found on the web with enabling the MpiPorts config >> in slurm and using the --resv-ports switch, but I'm getting an error >> from openmpi during setup. >> >> I'm using Slurm 2.1.15 and Openmpi 1.5 w/PSM >> >> I'm sure I'm missing a step. >> >> Thanks >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] flex.exe
openmpi-1.4.1/contrib/platform/win32/bin/flex.exe I understand this file might be required for building on windows, since I'm not I can just delete the file without issue. However, for those of us under import restrictions, where binaries are not allowed in, this file causes me to open the tarball and delete the file (not a big deal, i know, i know). But, can I put up a vote for a pure source only tree? Thanks...
Re: [OMPI users] openmpi 1.4 and barrier
Hmm, i don't recall seeing that... On Thu, Oct 1, 2009 at 1:51 PM, Jeff Squyres <jsquy...@cisco.com> wrote: > FWIW, I saw this bug to have race-condition-like behavior. I could run a > few times and then it would work. > > On Oct 1, 2009, at 1:42 PM, Michael Di Domenico wrote: > >> On Thu, Oct 1, 2009 at 1:37 PM, Jeff Squyres <jsquy...@cisco.com> wrote: >> > On Oct 1, 2009, at 1:24 PM, Michael Di Domenico wrote: >> > >> >> I just upgraded to the devel snapshot of 1.4a1r22031 >> >> >> >> when i run a simple hello world with a barrier i get >> >> >> >> btl_tcp_endpoint.c:484:mca_btl_tcp_endpoint_recv_connect_ack] received >> >> unexpected process identifier >> > >> > I have seen this failure over the last day or three myself. I'll file a >> > trac ticket about it. >> > >> > (all's fair in love, war, and trunk development snapshots!) >> >> Okay, thanks... Unfortunately i need the dev snap for slurm >> intergration... :( >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > -- > Jeff Squyres > jsquy...@cisco.com > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] openmpi 1.4 and barrier
On Thu, Oct 1, 2009 at 1:37 PM, Jeff Squyres <jsquy...@cisco.com> wrote: > On Oct 1, 2009, at 1:24 PM, Michael Di Domenico wrote: > >> I just upgraded to the devel snapshot of 1.4a1r22031 >> >> when i run a simple hello world with a barrier i get >> >> btl_tcp_endpoint.c:484:mca_btl_tcp_endpoint_recv_connect_ack] received >> unexpected process identifier > > I have seen this failure over the last day or three myself. I'll file a > trac ticket about it. > > (all's fair in love, war, and trunk development snapshots!) Okay, thanks... Unfortunately i need the dev snap for slurm intergration... :(
[OMPI users] openmpi 1.4 and barrier
I just upgraded to the devel snapshot of 1.4a1r22031 when i run a simple hello world with a barrier i get btl_tcp_endpoint.c:484:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier if i pull the barrier out the hello world runs fine interestingly enough, i can run IMB which also uses barrier and it runs just fine Any thoughts?
Re: [OMPI users] strange IMB runs
> One of the differences among MPI implementations is the default placement of > processes within the node. E.g., should processes by default be collocated > on cores of the same socket or on cores of different sockets? I don't know > if that issue is applicable here (that is, HP MPI vs Open MPI or on > Superdome architecture), but it's potentially an issue to look at. With HP > MPI, mpirun has a -cpu_bind switch for controlling placement. With Open > MPI, mpirun controls placement with -rankfile. > > E.g., what happens if you try > > % cat rf1 > rank 0=XX slot=0 > rank 1=XX slot=1 > % cat rf2 > rank 0=XX slot=0 > rank 1=XX slot=2 > % cat rf3 > rank 0=XX slot=0 > rank 1=XX slot=3 > [...etc...] > % mpirun -np 2 --mca btl self,sm --host XX,XX -rf rf1 $PWD/IMB-MPI1 pingpong > % mpirun -np 2 --mca btl self,sm --host XX,XX -rf rf2 $PWD/IMB-MPI1 pingpong > % mpirun -np 2 --mca btl self,sm --host XX,XX -rf rf3 $PWD/IMB-MPI1 pingpong > [...etc...] > > where XX is the name of your node and you march through all the cores on > your Superdome node? I tried this, but it didn't seem to make a difference either
Re: [OMPI users] strange IMB runs
On Thu, Aug 13, 2009 at 1:51 AM, Eugene Lohwrote: > Also, I'm puzzled why you should see better results by changing > btl_sm_eager_limit. That shouldn't change long-message bandwidth, but only > the message size at which one transitions from short to long messages. If > anything, tweaking btl_sm_max_send_size would be the variable to try. Changing this value does tweak the 1,2,4MB messages, but all others are the same and it's not consistent. I've also noticed that the latency is nearly 2usec high on OpenMPI vs HPMPI
Re: [OMPI users] strange IMB runs
On Thu, Aug 13, 2009 at 1:51 AM, Eugene Lohwrote: >>Is this behavior expected? Are there any tunables to get the OpenMPI >>sockets up near HP-MPI? > > First, I want to understand the configuration. It's just a single node. No > interconnect (InfiniBand or Ethernet or anything). Right? Yes, the superdome arch is itanium processors all connected internally by some interlink. No ethernet or infiniband is at play.
Re: [OMPI users] strange IMB runs
So pushing this along a little more running with openmpi-1.3 svn rev 20295 mpirun -np 2 -mca btl sm,self -mca mpi_paffinity_alone 1 -mca mpi_leave_pinned 1 -mca btl_sm_eager_limit 8192 $PWD/IMB-MPI1 pingpong Yields ~390MB/sec So we're getting there, but still only about half speed On Thu, Aug 6, 2009 at 9:30 AM, Michael Di Domenico<mdidomeni...@gmail.com> wrote: > Here's an interesting data point. I installed the RHEL rpm version of > OpenMPI 1.2.7-6 for ia64 > > mpirun -np 2 -mca btl self,sm -mca mpi_paffinity_alone 1 -mca > mpi_leave_pinned 1 $PWD/IMB-MPI1 pingpong > > With v1.3 and -mca btl self,sm i get ~150MB/sec > With v1.3 and -mca btl self,tcp i get ~550MB/sec > > With v1.2.7-6 and -mca btl self,sm i get ~225MB/sec > With v1.2.7-6 and -mca btl self,tcp i get ~650MB/sec > > > On Fri, Jul 31, 2009 at 10:42 AM, Edgar Gabriel<gabr...@cs.uh.edu> wrote: >> Michael Di Domenico wrote: >>> >>> mpi_leave_pinned didn't help still at ~145MB/sec >>> btl_sm_eager_limit from 4096 to 8192 pushes me upto ~212MB/sec, but >>> pushing it past that doesn't change it anymore >>> >>> Are there any intelligent programs that can go through and test all >>> the different permutations of tunables for openmpi? Outside of me >>> just writing an ugly looping script... >> >> actually there is, >> >> http://svn.open-mpi.org/svn/otpo/trunk/ >> >> this tool has been used to tune openib parameter, and I would guess that it >> could be used without any modification to also run netpipe over sm... >> >> Thanks >> Edgar >>> >>> On Wed, Jul 29, 2009 at 1:55 PM, Dorian Krause<doriankra...@web.de> wrote: >>>> >>>> Hi, >>>> >>>> --mca mpi_leave_pinned 1 >>>> >>>> might help. Take a look at the FAQ for various tuning parameters. >>>> >>>> >>>> Michael Di Domenico wrote: >>>>> >>>>> I'm not sure I understand what's actually happened here. I'm running >>>>> IMB on an HP superdome, just comparing the PingPong benchmark >>>>> >>>>> HP-MPI v2.3 >>>>> Max ~ 700-800MB/sec >>>>> >>>>> OpenMPI v1.3 >>>>> -mca btl self,sm - Max ~ 125-150MB/sec >>>>> -mca btl self,tcp - Max ~ 500-550MB/sec >>>>> >>>>> Is this behavior expected? Are there any tunables to get the OpenMPI >>>>> sockets up near HP-MPI? >
[OMPI users] x4100 with IB
I have several Sun x4100 with Infiniband which appear to be running at 400MB/sec instead of 800MB/sec. It a freshly reformatted cluster converting from solaris to linux. We also reset the bios settings with "load optimal defaults". Does anyone know which bios setting i changed to dump the BW? x4100 mellanox ib ofed-1.4.1-rc6 w/ openmpi
Re: [OMPI users] strange IMB runs
mpi_leave_pinned didn't help still at ~145MB/sec btl_sm_eager_limit from 4096 to 8192 pushes me upto ~212MB/sec, but pushing it past that doesn't change it anymore Are there any intelligent programs that can go through and test all the different permutations of tunables for openmpi? Outside of me just writing an ugly looping script... On Wed, Jul 29, 2009 at 1:55 PM, Dorian Krause<doriankra...@web.de> wrote: > Hi, > > --mca mpi_leave_pinned 1 > > might help. Take a look at the FAQ for various tuning parameters. > > > Michael Di Domenico wrote: >> >> I'm not sure I understand what's actually happened here. I'm running >> IMB on an HP superdome, just comparing the PingPong benchmark >> >> HP-MPI v2.3 >> Max ~ 700-800MB/sec >> >> OpenMPI v1.3 >> -mca btl self,sm - Max ~ 125-150MB/sec >> -mca btl self,tcp - Max ~ 500-550MB/sec >> >> Is this behavior expected? Are there any tunables to get the OpenMPI >> sockets up near HP-MPI? >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] strange IMB runs
On Thu, Jul 30, 2009 at 10:08 AM, George Bosilcawrote: > The leave pinned will not help in this context. It can only help for devices > capable of real RMA operations and that require pinned memory, which > unfortunately is not the case for TCP. What is [really] strange about your > results is that you get a 4 times better bandwidth over TCP than over shared > memory. Over TCP there are 2 extra memory copies (compared with sm) plus a > bunch of syscalls, so there is absolutely no reason to get better > performance. > > The Open MPI version is something you compiled or it came installed with the > OS? If you compiled it can you please provide us the configure line? OpenMPI was compiled from source v1.3 with only a --prefix line, no other options.
[OMPI users] strange IMB runs
I'm not sure I understand what's actually happened here. I'm running IMB on an HP superdome, just comparing the PingPong benchmark HP-MPI v2.3 Max ~ 700-800MB/sec OpenMPI v1.3 -mca btl self,sm - Max ~ 125-150MB/sec -mca btl self,tcp - Max ~ 500-550MB/sec Is this behavior expected? Are there any tunables to get the OpenMPI sockets up near HP-MPI?
Re: [OMPI users] quadrics support?
On Wed, Jul 8, 2009 at 3:33 PM, Ashley Pittmanwrote: >> When i run tping i get: >> ELAN_EXCEOPTIOn @ --: 6 (Initialization error) >> elan_init: Can't get capability from environment >> >> I am not using slurm or RMS at all, just trying to get openmpi to run >> between two nodes. > > To attach to the elan a process has to have a "capability" which is a > kernel attribute describing the size (number of nodes/ranks) of the job, > without this you'll get errors like the one from tping. The only way to > generate these capabilities is by using RMS, Slurm or I believe pdsh > which can generate one and push it into the kernel before calling fork() > to create the user application. I didn't realize it was an MPI type program, so I ran is using the QSNet version of mpirun and OpenMPI. The process does start and runs through 0: and 2:, which i assume are packet sizes, but freezes at that point. We have an existing XC cluster from HP, that we're trying to convert from XC to standard RHEL5.3 w/ Slurm and OpenMPI. All i want to be able to do is load RHEL5 and the Quadrics NIC drivers, and run OpenMPI jobs between these two nodes I yanked from the cluster before we switch the whole thing over.
Re: [OMPI users] quadrics support?
On Wed, Jul 8, 2009 at 12:33 PM, Ashley Pittmanwrote: > Is the machine configured correctly to allow non OpenMPI QsNet programs > to run, for example tping? > > Which resource manager are you running, I think slurm compiled for RMS > is essential. I can ping via TCP/IP using the eip0 ports. When i run tping i get: ELAN_EXCEOPTIOn @ --: 6 (Initialization error) elan_init: Can't get capability from environment I am not using slurm or RMS at all, just trying to get openmpi to run between two nodes. Using -mca btl self,tcp -mca btl_tcp_if_include eip0 i can run the jobs no problem using sockets over the elan interface, but if i run the job with -mca btl self,elan,tcp, below is the short snipped output: Signal: Segmentation fault (11) Signal code: Invalid permissions (2)
Re: [OMPI users] quadrics support?
So, first run i seem to have run into a bit of an issue. All the Quadrics modules are compiled and loaded. I can ping between nodes over the quadrics interfaces. But when i try to run one of the hello mpi example from openmpi, i get: first run, the process hung - killed with ctl-c though it doesnt seem to actually die and kill -9 doesn't work second run, the process fails with failed elan4_attach Device or resource busy elan_allocSleepDesc Failed to allocate IRQ cookie 2a: 22 Invalid argument all subsequent runs fail the same way and i have to reboot the box to get the processes to go away I'm not sure if this is a quadrics or openmpi issue at this point, but i figured since there are quadrics people on the list its a good place to start On Tue, Jul 7, 2009 at 3:30 PM, Michael Di Domenico<mdidomeni...@gmail.com> wrote: > Does OpenMPI/Quadrics require the Quadrics Kernel patches in order to > operate? Or operate at full speed or are the Quadrics modules > sufficient? > > On Thu, Jul 2, 2009 at 1:52 PM, Ashley Pittman<ash...@pittman.co.uk> wrote: >> On Thu, 2009-07-02 at 09:34 -0400, Michael Di Domenico wrote: >>> Jeff, >>> >>> Okay, thanks. I'll give it a shot and report back. I can't >>> contribute any code, but I can certainly do testing... >> >> I'm from the Quadrics stable so could certainty support a port should >> you require it but I don't have access to hardware either currently. >> >> Ashley, >> >> -- >> >> Ashley Pittman, Bath, UK. >> >> Padb - A parallel job inspection tool for cluster computing >> http://padb.pittman.org.uk >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >
Re: [OMPI users] quadrics support?
Does OpenMPI/Quadrics require the Quadrics Kernel patches in order to operate? Or operate at full speed or are the Quadrics modules sufficient? On Thu, Jul 2, 2009 at 1:52 PM, Ashley Pittman<ash...@pittman.co.uk> wrote: > On Thu, 2009-07-02 at 09:34 -0400, Michael Di Domenico wrote: >> Jeff, >> >> Okay, thanks. I'll give it a shot and report back. I can't >> contribute any code, but I can certainly do testing... > > I'm from the Quadrics stable so could certainty support a port should > you require it but I don't have access to hardware either currently. > > Ashley, > > -- > > Ashley Pittman, Bath, UK. > > Padb - A parallel job inspection tool for cluster computing > http://padb.pittman.org.uk > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] quadrics support?
Jeff, Okay, thanks. I'll give it a shot and report back. I can't contribute any code, but I can certainly do testing... On Thu, Jul 2, 2009 at 9:23 AM, Jeff Squyres<jsquy...@cisco.com> wrote: > I see ompi/mca/btl/elan in the OMPI SVN development trunk and in the 1.3 > tree (where elan = the quadrics interface). > > So actually, looking at the 1.3.x README, I see configure switches like > "--with-elan" that specifies where the Elan (Quadrics) headers and libraries > live. I have no Quadrics networks and didn't pay attention to this > development at all (obviously ;-) ) -- you might want to give it a shot and > see how well it performs. Meaning: I'm sure it works or UT wouldn't have > pushed this stuff upstream, but I have no idea how well tuned it is. > > If you build OMPI properly, you should be able to tell if Quadrics support > is included via > >ompi_info | grep elan > > You should see a BTL line for elan (i.e., a BTL plugin for "elan" is > installed and functional). Although OMPI should automatically pick elan for > MPI communications, you can force OMPI to pick it via: > >mpirun --mca btl elan,self ... > > Quadrics networks should also qualify for Open MPI's "other" type of network > support (the MTL, instead of the BTL). MTL level support can typically give > slightly better performance on some types of networks, but it doesn't look > like anyone did any work in this area. Contributions are always welcome, of > course! :-) > > > > On Jul 2, 2009, at 9:12 AM, Michael Di Domenico wrote: > >> Jeff, >> >> Thanks, honestly though if the patches haven't been pulled mainline, >> we are not likely to bring it internally. I was hoping that quadrics >> support was mainline, but the documentation was out of date. >> >> On Thu, Jul 2, 2009 at 8:08 AM, Jeff Squyres<jsquy...@cisco.com> wrote: >> > George -- >> > >> > I know that U. Tennessee did some work in this area; did it ever >> > materialize? >> > >> > >> > On Jul 1, 2009, at 4:49 PM, Michael Di Domenico wrote: >> > >> >> Did the quadrics support for OpenMPI ever materialize? I can't find >> >> any documentation on the web about it and the few mailing list >> >> messages I came across showed some hints that it might be in progress >> >> but that was way back in 2007 >> >> >> >> Thanks >> >> ___ >> >> users mailing list >> >> us...@open-mpi.org >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> > >> > >> > -- >> > Jeff Squyres >> > Cisco Systems >> > >> > ___ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> > >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > -- > Jeff Squyres > Cisco Systems > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] quadrics support?
Jeff, Thanks, honestly though if the patches haven't been pulled mainline, we are not likely to bring it internally. I was hoping that quadrics support was mainline, but the documentation was out of date. On Thu, Jul 2, 2009 at 8:08 AM, Jeff Squyres<jsquy...@cisco.com> wrote: > George -- > > I know that U. Tennessee did some work in this area; did it ever > materialize? > > > On Jul 1, 2009, at 4:49 PM, Michael Di Domenico wrote: > >> Did the quadrics support for OpenMPI ever materialize? I can't find >> any documentation on the web about it and the few mailing list >> messages I came across showed some hints that it might be in progress >> but that was way back in 2007 >> >> Thanks >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > -- > Jeff Squyres > Cisco Systems > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] quadrics support?
Did the quadrics support for OpenMPI ever materialize? I can't find any documentation on the web about it and the few mailing list messages I came across showed some hints that it might be in progress but that was way back in 2007 Thanks