Re: [OMPI users] local rank to rank comms

2019-03-20 Thread Michael Di Domenico
ould > btl/ofi also be used for intra node communications ?) > > > mpirun --mca pml_base_verbose 10 --mca btl_base_verbose 10 --mca > mtl_base_verbose 10 ... > > should tell you what is used (feel free to compress and post the full > output if you have some hard time understandi

Re: [OMPI users] local rank to rank comms

2019-03-11 Thread Michael Di Domenico
On Mon, Mar 11, 2019 at 12:09 PM Gilles Gouaillardet wrote: > You can force > mpirun --mca pml ob1 ... > And btl/vader (shared memory) will be used for intra node communications ... > unless MPI tasks are from different jobs (read MPI_Comm_spawn()) if i run mpirun -n 16 IMB-MPI1 alltoallv

Re: [OMPI users] local rank to rank comms

2019-03-11 Thread Michael Di Domenico
On Mon, Mar 11, 2019 at 12:19 PM Ralph H Castain wrote: > OFI uses libpsm2 underneath it when omnipath detected > > > On Mar 11, 2019, at 9:06 AM, Gilles Gouaillardet > > wrote: > > It might show that pml/cm and mtl/psm2 are used. In that case, then yes, > > the OmniPath library is used even

Re: [OMPI users] local rank to rank comms

2019-03-11 Thread Michael Di Domenico
On Mon, Mar 11, 2019 at 11:51 AM Ralph H Castain wrote: > You are probably using the ofi mtl - could be psm2 uses loopback method? according to ompi_info i do in fact have mtl's ofi,psm,psm2. i haven't changed any of the defaults, so are you saying order to change the behaviour i have to run

[OMPI users] local rank to rank comms

2019-03-11 Thread Michael Di Domenico
i have a user that's claiming when two ranks on the same node want to talk with each other, they're using the NIC to talk rather then just talking directly. i've never had to test such a scenario. is there a way for me to prove one way or another whether two ranks are talking through say the

Re: [OMPI users] pmix and srun

2019-01-18 Thread Michael Di Domenico
s a typo in the v2.2.1 release. Sadly, our Slurm > > plugin folks seem to be off somewhere for awhile and haven’t been testing > > it. Sigh. > > > > I’ll patch the branch and let you know - we’d appreciate the feedback. > > Ralph > > > > > >>

Re: [OMPI users] Fwd: pmix and srun

2019-01-18 Thread Michael Di Domenico
t; PMIX_MCA_pmix_client_event_verbose=5 > PMIX_MCA_pmix_server_event_verbose=5 > OMPI_MCA_pmix_base_verbose=10 > > to your environment and see if that provides anything useful. > > > On Jan 18, 2019, at 12:09 PM, Michael Di Domenico > > wrote: > > > > i compilied pmix slurm op

[OMPI users] Fwd: pmix and srun

2019-01-18 Thread Michael Di Domenico
i compilied pmix slurm openmpi ---pmix ./configure --prefix=/hpc/pmix/2.2 --with-munge=/hpc/munge/0.5.13 --disable-debug ---slurm ./configure --prefix=/hpc/slurm/18.08 --with-munge=/hpc/munge/0.5.13 --with-pmix=/hpc/pmix/2.2 ---openmpi ./configure --prefix=/hpc/ompi/3.1 --with-hwloc=external

Re: [OMPI users] OpenFabrics warning

2018-11-12 Thread Michael Di Domenico
On Mon, Nov 12, 2018 at 8:08 AM Andrei Berceanu wrote: > > Running a CUDA+MPI application on a node with 2 K80 GPUs, I get the following > warnings: > > -- > WARNING: There is at least non-excluded one OpenFabrics device

Re: [OMPI users] Problem running with UCX/oshmem on single node?

2018-05-14 Thread Michael Di Domenico
On Wed, May 9, 2018 at 9:45 PM, Howard Pritchard wrote: > > You either need to go and buy a connectx4/5 HCA from mellanox (and maybe a > switch), and install that > on your system, or else install xpmem (https://github.com/hjelmn/xpmem). > Note there is a bug right now > in

[OMPI users] shmem

2018-05-09 Thread Michael Di Domenico
before i debug ucx further (cause it's totally not working for me), i figured i'd check to see if it's *really* required to use shmem inside of openmpi. i'm pretty sure the answer is yes, but i wanted to double check. ___ users mailing list

Re: [OMPI users] openmpi/slurm/pmix

2018-04-25 Thread Michael Di Domenico
On Mon, Apr 23, 2018 at 6:07 PM, r...@open-mpi.org wrote: > Looks like the problem is that you didn’t wind up with the external PMIx. The > component listed in your error is the internal PMIx one which shouldn’t have > built given that configure line. > > Check your

[OMPI users] openmpi/slurm/pmix

2018-04-23 Thread Michael Di Domenico
i'm trying to get slurm 17.11.5 and openmpi 3.0.1 working with pmix. everything compiled, but when i run something it get : symbol lookup error: /openmpi/mca_pmix_pmix2x.so: undefined symbol: opal_libevent2022_evthread_use_pthreads i more then sure i did something wrong, but i'm not sure what,

Re: [OMPI users] disabling libraries?

2018-04-10 Thread Michael Di Domenico
On Sat, Apr 7, 2018 at 3:50 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote: > On Apr 6, 2018, at 8:12 AM, Michael Di Domenico <mdidomeni...@gmail.com> > wrote: >> it would be nice if openmpi had (or may already have) a simple switch >> that lets me disable

Re: [OMPI users] disabling libraries?

2018-04-06 Thread Michael Di Domenico
On Thu, Apr 5, 2018 at 7:59 PM, Gilles Gouaillardet wrote: > That being said, the error suggest mca_oob_ud.so is a module from a > previous install, > Open MPI was not built on the system it is running, or libibverbs.so.1 > has been removed after > Open MPI was

[OMPI users] disabling libraries?

2018-04-05 Thread Michael Di Domenico
i'm trying to compile openmpi to support all of our interconnects, psm/openib/mxm/etc this works fine, openmpi finds all the libs, compiles and runs on each of the respective machines however, we don't install the libraries for everything everywhere so when i run things like ompi_info and

[OMPI users] openmpi hang on IB disconnect

2018-01-17 Thread Michael Di Domenico
openmpi-2.0.2 running on rhel 7.4 with qlogic QDR infiniband switches/adapters, also using slurm i have a user that's running a job over multiple days. unfortunately after a few days at random the job will seemingly hang. the latest instance was caused by an infiniband adapter that went offline

[OMPI users] openmpi mgmt traffic

2017-10-11 Thread Michael Di Domenico
my cluster nodes are connected on 1g ethernet eth0/eth1 and via infiniband rdma and ib0 my understanding is that openmpi will detect all these interfaces. using eth0/eth1 for connection setup and use rdma for msg passing what would be an appropriate to command line parameters to tell openmpi to

[OMPI users] alltoallv

2017-10-10 Thread Michael Di Domenico
i'm getting stuck trying to run some fairly large IMB-MPI alltoall tests under openmpi 2.0.2 on rhel 7.4 i have two different clusters, one running mellanox fdr10 and one running qlogic qdr if i issue mpirun -n 1024 ./IMB-MPI1 -npmin 1024 -iter 1 -mem 2.001 alltoallv the job just stalls after

Re: [OMPI users] disable slurm/munge from mpirun

2017-06-23 Thread Michael Di Domenico
On Thu, Jun 22, 2017 at 12:41 PM, r...@open-mpi.org wrote: > I gather you are using OMPI 2.x, yes? And you configured it > --with-pmi=, then moved the executables/libs to your > workstation? correct > I suppose I could state the obvious and say “don’t do that - just rebuild

Re: [OMPI users] disable slurm/munge from mpirun

2017-06-22 Thread Michael Di Domenico
On Thu, Jun 22, 2017 at 10:43 AM, John Hearns via users wrote: > Having had some problems with ssh launching (a few minutes ago) I can > confirm that this works: > > --mca plm_rsh_agent "ssh -v" this doesn't do anything for me if i set OMPI_MCA_sec=^munge i can clear

Re: [OMPI users] disable slurm/munge from mpirun

2017-06-22 Thread Michael Di Domenico
sh > > I've been fooling with this myself recently, in the contect of a PBS cluster > > On 22 June 2017 at 16:16, Michael Di Domenico <mdidomeni...@gmail.com> > wrote: >> >> is it possible to disable slurm/munge/psm/pmi(x) from the mpirun >> command line or (

[OMPI users] disable slurm/munge from mpirun

2017-06-22 Thread Michael Di Domenico
is it possible to disable slurm/munge/psm/pmi(x) from the mpirun command line or (better) using environment variables? i'd like to use the installed version of openmpi i have on a workstation, but it's linked with slurm from one of my clusters. mpi/slurm work just fine on the cluster, but when i

Re: [OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-25 Thread Michael Di Domenico
On Mon, Jul 25, 2016 at 4:53 AM, Gilles Gouaillardet wrote: > > as a workaround, you can configure without -noswitcherror. > > after you ran configure, you have to manually patch the generated 'libtool' > file and add the line with pgcc*) and the next line like this : > > /* if

Re: [OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-22 Thread Michael Di Domenico
remove the "-pthread" from libslurm.la and libpmi.la >> >> On 07/11/2016 02:54 PM, Michael Di Domenico wrote: >>> >>> I'm trying to get openmpi compiled using the PGI compiler. >>> >>> the configure goes through and the code starts to co

Re: [OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-14 Thread Michael Di Domenico
On Mon, Jul 11, 2016 at 9:52 AM, Åke Sandgren wrote: > Looks like you are compiling with slurm support. > > If so, you need to remove the "-pthread" from libslurm.la and libpmi.la i don't see a configure option in slurm to disable pthreads, so i'm not sure this is

Re: [OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-14 Thread Michael Di Domenico
On Thu, Jul 14, 2016 at 9:47 AM, Michael Di Domenico <mdidomeni...@gmail.com> wrote: > Have 1.10.3 unpacked, ran through the configure using the same command > line options as 1.10.2 > > but it fails even earlier in the make process at > > Entering openmpi-1.10.3/opal/a

Re: [OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-14 Thread Michael Di Domenico
3 instead ? > > btw, do you have a license for the pgCC C++ compiler ? > fwiw, FreePGI on OSX has no C++ license and PGI C and gnu g++ does not work > together out of the box, hopefully I will have a fix ready sometimes this > week > > Cheers, > > Gilles > > > On Monday,

Re: [OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-11 Thread Michael Di Domenico
On Mon, Jul 11, 2016 at 9:11 AM, Gilles Gouaillardet wrote: > Can you try the latest 1.10.3 instead ? i can but it'll take a few days to pull the software inside. > btw, do you have a license for the pgCC C++ compiler ? > fwiw, FreePGI on OSX has no C++ license

[OMPI users] openmpi 1.10.2 and PGI 15.9

2016-07-11 Thread Michael Di Domenico
I'm trying to get openmpi compiled using the PGI compiler. the configure goes through and the code starts to compile, but then gets hung up with entering: openmpi-1.10.2/opal/mca/common/pmi CC common_pmi.lo CCLD libmca_common_pmi.la pgcc-Error-Unknown switch: - pthread

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Michael Di Domenico
On Thu, Mar 17, 2016 at 12:15 PM, Cabral, Matias A wrote: > I was looking for lines like" [nodexyz:17085] selected cm best priority 40" > and " [nodexyz:17099] select: component psm selected" this may have turned up more then i expected. i recompiled openmpi v1.8.4

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Michael Di Domenico
On Thu, Mar 17, 2016 at 12:52 PM, Jeff Squyres (jsquyres) wrote: > Can you send all the information listed here? > > https://www.open-mpi.org/community/help/ > > (including the full output from the run with the PML/BTL/MTL/etc. verbosity) > > This will allow Matias to look

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Michael Di Domenico
On Thu, Mar 17, 2016 at 12:15 PM, Cabral, Matias A wrote: > I was looking for lines like" [nodexyz:17085] selected cm best priority 40" > and " [nodexyz:17099] select: component psm selected" i see cm best priority 20, which seems to relate to ob1 being selected. i

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Michael Di Domenico
On Wed, Mar 16, 2016 at 4:49 PM, Cabral, Matias A wrote: > I didn't go into the code to see who is actually calling this error message, > but I suspect this may be a generic error for "out of memory" kind of thing > and not specific to the que pair. To confirm please

Re: [OMPI users] locked memory and queue pairs

2016-03-16 Thread Michael Di Domenico
On Wed, Mar 16, 2016 at 3:37 PM, Cabral, Matias A wrote: > Hi Michael, > > I may be missing some context, if you are using the qlogic cards you will > always want to use the psm mtl (-mca pml cm -mca mtl psm) and not openib btl. > As Tom suggest, confirm the limits

Re: [OMPI users] locked memory and queue pairs

2016-03-16 Thread Michael Di Domenico
On Wed, Mar 16, 2016 at 12:12 PM, Elken, Tom wrote: > Hi Mike, > > In this file, > $ cat /etc/security/limits.conf > ... > < do you see at the end ... > > > * hard memlock unlimited > * soft memlock unlimited > # -- All InfiniBand Settings End here -- > ? Yes. I double

Re: [OMPI users] locked memory and queue pairs

2016-03-16 Thread Michael Di Domenico
On Thu, Mar 10, 2016 at 11:54 AM, Michael Di Domenico <mdidomeni...@gmail.com> wrote: > when i try to run an openmpi job with >128 ranks (16 ranks per node) > using alltoall or alltoallv, i'm getting an error that the process was > unable to get a queue pair. > > i've check

[OMPI users] locked memory and queue pairs

2016-03-10 Thread Michael Di Domenico
when i try to run an openmpi job with >128 ranks (16 ranks per node) using alltoall or alltoallv, i'm getting an error that the process was unable to get a queue pair. i've checked the max locked memory settings across my machines; using ulimit -l in and outside of mpirun and they're all set to

[OMPI users] slurm openmpi 1.8.3 core bindings

2015-01-30 Thread Michael Di Domenico
I'm trying to get slurm and openmpi to cooperate when running multi thread jobs. i'm sure i'm doing something wrong, but i can't figure out what my node configuration is 2 nodes 2 sockets 6 cores per socket i want to run sbatch -N2 -n 8 --ntasks-per-node=4 --cpus-per-task=3 -w node1,node2

Re: [OMPI users] ipath_userinit errors

2014-11-06 Thread Michael Di Domenico
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Michael Di >> Domenico >> Sent: Tuesday, November 4, 2014 8:35 AM >> To: Open MPI Users >> Subject: [OMPI users] ipath_userinit errors >> >> I'm getting the below message on my cluster(s). It seem

[OMPI users] ipath_userinit errors

2014-11-04 Thread Michael Di Domenico
I'm getting the below message on my cluster(s). It seems to only happen when I try to use more then 64 nodes (16-cores each). The clusters are running RHEL 6.5 with Slurm and Openmpi-1.6.5 with PSM. I'm using the OFED versions included with RHEL for infiniband support. ipath_userinit:

Re: [OMPI users] debugs for jobs not starting

2012-10-12 Thread Michael Di Domenico
: /tmp [...above lines only come out once...] On Fri, Oct 12, 2012 at 9:27 AM, Michael Di Domenico <mdidomeni...@gmail.com> wrote: > what isn't working is when i fire off an MPI job with over 800 ranks, > they don't all actually start up a process > > fe, if i do srun -n 1024 --

Re: [OMPI users] debugs for jobs not starting

2012-10-12 Thread Michael Di Domenico
Timeout) >> >> i didn't see anything useful in /proc under those file descriptors, >> but perhaps i missed something i don't know to look for >> >> On Thu, Oct 11, 2012 at 12:06 PM, Michael Di Domenico >> <mdidomeni...@gmail.com> wrote: >> > too add a

Re: [OMPI users] debugs for jobs not starting

2012-10-11 Thread Michael Di Domenico
pl, i do see the orte process, but nothing in the > logs about why it failed to launch xhpl > > > > On Thu, Oct 11, 2012 at 11:49 AM, Michael Di Domenico > <mdidomeni...@gmail.com> wrote: >> I'm trying to diagnose an MPI job (in this case xhpl), that fails to >> s

Re: [OMPI users] debugs for jobs not starting

2012-10-11 Thread Michael Di Domenico
, Oct 11, 2012 at 11:49 AM, Michael Di Domenico <mdidomeni...@gmail.com> wrote: > I'm trying to diagnose an MPI job (in this case xhpl), that fails to > start when the rank count gets fairly high into the thousands. > > My symptom is the jobs fires up via slurm, and I can see all th

[OMPI users] debugs for jobs not starting

2012-10-11 Thread Michael Di Domenico
I'm trying to diagnose an MPI job (in this case xhpl), that fails to start when the rank count gets fairly high into the thousands. My symptom is the jobs fires up via slurm, and I can see all the xhpl processes on the nodes, but it never kicks over to the next process. My question is, what

Re: [OMPI users] srun and openmpi

2011-04-29 Thread Michael Di Domenico
Certainly, i reached out to several contacts I have inside qlogic (i used to work there)... On Fri, Apr 29, 2011 at 10:30 AM, Ralph Castain wrote: > Hi Michael > > I'm told that the Qlogic contacts we used to have are no longer there. Since > you obviously are a customer, can

Re: [OMPI users] srun and openmpi

2011-04-29 Thread Michael Di Domenico
On Fri, Apr 29, 2011 at 10:01 AM, Michael Di Domenico <mdidomeni...@gmail.com> wrote: > On Fri, Apr 29, 2011 at 4:52 AM, Ralph Castain <r...@open-mpi.org> wrote: >> Hi Michael >> >> Please see the attached updated patch to try for 1.5.3. I mistakenly

Re: [OMPI users] srun and openmpi

2011-04-29 Thread Michael Di Domenico
On Fri, Apr 29, 2011 at 4:52 AM, Ralph Castain wrote: > Hi Michael > > Please see the attached updated patch to try for 1.5.3. I mistakenly free'd > the envar after adding it to the environ :-/ The patch works great, i can now see the precondition environment variable if i do

Re: [OMPI users] srun and openmpi

2011-04-28 Thread Michael Di Domenico
On Thu, Apr 28, 2011 at 9:03 AM, Ralph Castain <r...@open-mpi.org> wrote: > > On Apr 28, 2011, at 6:49 AM, Michael Di Domenico wrote: > >> On Wed, Apr 27, 2011 at 11:47 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>> On Apr 27, 2011, at 1:06 PM, Mi

Re: [OMPI users] srun and openmpi

2011-04-28 Thread Michael Di Domenico
On Wed, Apr 27, 2011 at 11:47 PM, Ralph Castain <r...@open-mpi.org> wrote: > > On Apr 27, 2011, at 1:06 PM, Michael Di Domenico wrote: > >> On Wed, Apr 27, 2011 at 2:46 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>> On Apr 27, 2011, at 12:38 P

Re: [OMPI users] srun and openmpi

2011-04-27 Thread Michael Di Domenico
On Wed, Apr 27, 2011 at 2:46 PM, Ralph Castain <r...@open-mpi.org> wrote: > > On Apr 27, 2011, at 12:38 PM, Michael Di Domenico wrote: > >> On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>> On Apr 27,

Re: [OMPI users] srun and openmpi

2011-04-27 Thread Michael Di Domenico
On Wed, Apr 27, 2011 at 2:25 PM, Ralph Castain <r...@open-mpi.org> wrote: > > On Apr 27, 2011, at 10:09 AM, Michael Di Domenico wrote: > >> Was this ever committed to the OMPI src as something not having to be >> run outside of OpenMPI, but as part of the PSM

Re: [OMPI users] srun and openmpi

2011-04-27 Thread Michael Di Domenico
value as many times as you like - it doesn't have to be >> unique for each job. There is nothing magic about the value itself. >> >> On Dec 30, 2010, at 2:11 PM, Michael Di Domenico wrote: >> >>> How early does this need to run? Can I run it as part of a task &

[OMPI users] Ofed v1.5.3?

2011-04-16 Thread Michael Di Domenico
Does OpenMPI v1.5.3 support Ofed v.1.5.3.1 ?

Re: [OMPI users] alltoall messages > 2^26

2011-04-11 Thread Michael Di Domenico
buffering under the hood, so even though you're > sending array's over 2^26 in size, it may require more than that for MPI to > actually send it. > > On Mon, Apr 4, 2011 at 2:16 PM, Michael Di Domenico < > mdidomeni...@gmail.com> wrote: > >> Has anyone seen an issue wh

Re: [OMPI users] alltoall messages > 2^26

2011-04-05 Thread Michael Di Domenico
er 2^26 in size, it may require more than that for MPI to > actually send it. > > On Mon, Apr 4, 2011 at 2:16 PM, Michael Di Domenico <mdidomeni...@gmail.com> > wrote: >> >> Has anyone seen an issue where OpenMPI/Infiniband hangs when sending >> messages over 2^2

Re: [OMPI users] srun and openmpi

2011-01-25 Thread Michael Di Domenico
gt; HPC-3, LANL > > On Tue, 25 Jan 2011, Michael Di Domenico wrote: > >> Thanks.  We're only seeing it on machines with Ethernet only as the >> interconnect.  fortunately for us that only equates to one small >> machine, but it's still annoying.  unfortunately, i don't hav

Re: [OMPI users] srun and openmpi

2011-01-25 Thread Michael Di Domenico
, 2011 at 1:41 PM, Nathan Hjelm <hje...@lanl.gov> wrote: > I am seeing similar issues on our slurm clusters. We are looking into the > issue. > > -Nathan > HPC-3, LANL > > On Tue, 11 Jan 2011, Michael Di Domenico wrote: > >> Any ideas on what might be causing this

Re: [OMPI users] srun and openmpi

2011-01-11 Thread Michael Di Domenico
Any ideas on what might be causing this one? Or atleast what additional debug information someone might need? On Fri, Jan 7, 2011 at 4:03 PM, Michael Di Domenico <mdidomeni...@gmail.com> wrote: > I'm still testing the slurm integration, which seems to work fine so > far.  How

Re: [OMPI users] CQ errors

2011-01-10 Thread Michael Di Domenico
2011/1/10 Peter Kjellström <c...@nsc.liu.se>: > On Monday, January 10, 2011 03:06:06 pm Michael Di Domenico wrote: >> I'm not sure if these are being reported from OpenMPI or through >> OpenMPI from OpenFabrics, but i figured this would be a good place to >> start &g

[OMPI users] CQ errors

2011-01-10 Thread Michael Di Domenico
I'm not sure if these are being reported from OpenMPI or through OpenMPI from OpenFabrics, but i figured this would be a good place to start On one node we received the below errors, i'm not sure i under the error sequence, hopefully someone can shed some light on what happened.

Re: [OMPI users] srun and openmpi

2011-01-07 Thread Michael Di Domenico
kay. >>>> Looks like this: >>>> >>>> $ ./psm_keygen >>>> OMPI_MCA_orte_precondition_transports=0099b3eaa2c1547e-afb287789133a954 >>>> $ >>>> >>>> You compile the program with the usual mpicc. >>>> >>

Re: [OMPI users] srun and openmpi

2010-12-30 Thread Michael Di Domenico
generated by >>> mpirun as it is shared info - i.e., every proc has to get the same value. >>> >>> I can create a patch that will do this for the srun direct-launch scenario, >>> if you want to try it. Would be later today, though. >>> >>> >>

Re: [OMPI users] srun and openmpi

2010-12-30 Thread Michael Di Domenico
patch that will do this for the srun direct-launch scenario, > if you want to try it. Would be later today, though. > > > On Dec 30, 2010, at 10:31 AM, Michael Di Domenico wrote: > >> Well maybe not horray, yet.  I might have jumped the gun a bit, it's >> looking like sr

Re: [OMPI users] srun and openmpi

2010-12-30 Thread Michael Di Domenico
in the environment) PML add procs failed --> Returned "Error" (-1) instead of "Success" (0) Turn off PSM and srun works fine On Thu, Dec 30, 2010 at 5:13 PM, Ralph Castain <r...@open-mpi.org> wrote: > Hooray! > > On Dec 30, 2010, at 9:57 AM, Michael Di Domenico wrote: &g

Re: [OMPI users] srun and openmpi

2010-12-30 Thread Michael Di Domenico
I think i take it all back. I just tried it again and it seems to work now. I'm not sure what I changed (between my first and this msg), but it does appear to work now. On Thu, Dec 30, 2010 at 4:31 PM, Michael Di Domenico <mdidomeni...@gmail.com> wrote: > Yes that's true, error mess

Re: [OMPI users] srun and openmpi

2010-12-30 Thread Michael Di Domenico
best guess is that the port reservation didn't get passed down to the MPI > procs properly - but that's just a guess. > > > On Dec 23, 2010, at 12:46 PM, Michael Di Domenico wrote: > >> Can anyone point me towards the most recent documentation for using >> srun and op

[OMPI users] srun and openmpi

2010-12-23 Thread Michael Di Domenico
Can anyone point me towards the most recent documentation for using srun and openmpi? I followed what i found on the web with enabling the MpiPorts config in slurm and using the --resv-ports switch, but I'm getting an error from openmpi during setup. I'm using Slurm 2.1.15 and Openmpi 1.5 w/PSM

[OMPI users] openmpi v1.5?

2010-07-19 Thread Michael Di Domenico
Since I am a SVN neophyte can anyone tell me when openmpi 1.5 is scheduled for release? And whether the Slurm srun changes are going to make in? thanks

[OMPI users] flex.exe

2010-01-21 Thread Michael Di Domenico
openmpi-1.4.1/contrib/platform/win32/bin/flex.exe I understand this file might be required for building on windows, since I'm not I can just delete the file without issue. However, for those of us under import restrictions, where binaries are not allowed in, this file causes me to open the

Re: [OMPI users] openmpi 1.4 and barrier

2009-10-01 Thread Michael Di Domenico
Hmm, i don't recall seeing that... On Thu, Oct 1, 2009 at 1:51 PM, Jeff Squyres <jsquy...@cisco.com> wrote: > FWIW, I saw this bug to have race-condition-like behavior. I could run a > few times and then it would work. > > On Oct 1, 2009, at 1:42 PM, Michael Di Domenico wrote:

Re: [OMPI users] openmpi 1.4 and barrier

2009-10-01 Thread Michael Di Domenico
On Thu, Oct 1, 2009 at 1:37 PM, Jeff Squyres <jsquy...@cisco.com> wrote: > On Oct 1, 2009, at 1:24 PM, Michael Di Domenico wrote: > >> I just upgraded to the devel snapshot of 1.4a1r22031 >> >> when i run a simple hello world with a barrier i ge

[OMPI users] openmpi 1.4 and barrier

2009-10-01 Thread Michael Di Domenico
I just upgraded to the devel snapshot of 1.4a1r22031 when i run a simple hello world with a barrier i get btl_tcp_endpoint.c:484:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier if i pull the barrier out the hello world runs fine interestingly enough, i can run IMB

Re: [OMPI users] strange IMB runs

2009-08-14 Thread Michael Di Domenico
> One of the differences among MPI implementations is the default placement of > processes within the node. E.g., should processes by default be collocated > on cores of the same socket or on cores of different sockets? I don't know > if that issue is applicable here (that is, HP MPI vs Open MPI

Re: [OMPI users] strange IMB runs

2009-08-14 Thread Michael Di Domenico
On Thu, Aug 13, 2009 at 1:51 AM, Eugene Loh wrote: > Also, I'm puzzled why you should see better results by changing > btl_sm_eager_limit. That shouldn't change long-message bandwidth, but only > the message size at which one transitions from short to long messages. If >

Re: [OMPI users] strange IMB runs

2009-08-13 Thread Michael Di Domenico
On Thu, Aug 13, 2009 at 1:51 AM, Eugene Loh wrote: >>Is this behavior expected? Are there any tunables to get the OpenMPI >>sockets up near HP-MPI? > > First, I want to understand the configuration. It's just a single node. No > interconnect (InfiniBand or Ethernet or

Re: [OMPI users] strange IMB runs

2009-08-12 Thread Michael Di Domenico
On Thu, Aug 6, 2009 at 9:30 AM, Michael Di Domenico<mdidomeni...@gmail.com> wrote: > Here's an interesting data point. I installed the RHEL rpm version of > OpenMPI 1.2.7-6 for ia64 > > mpirun -np 2 -mca btl self,sm -mca mpi_paffinity_alone 1 -mca > mpi_leave_pinned 1 $

[OMPI users] x4100 with IB

2009-08-07 Thread Michael Di Domenico
I have several Sun x4100 with Infiniband which appear to be running at 400MB/sec instead of 800MB/sec. It a freshly reformatted cluster converting from solaris to linux. We also reset the bios settings with "load optimal defaults". Does anyone know which bios setting i changed to dump the BW?

Re: [OMPI users] strange IMB runs

2009-07-31 Thread Michael Di Domenico
of me just writing an ugly looping script... On Wed, Jul 29, 2009 at 1:55 PM, Dorian Krause<doriankra...@web.de> wrote: > Hi, > > --mca mpi_leave_pinned 1 > > might help. Take a look at the FAQ for various tuning parameters. > > > Michael Di Domenico wrote: >>

Re: [OMPI users] strange IMB runs

2009-07-30 Thread Michael Di Domenico
On Thu, Jul 30, 2009 at 10:08 AM, George Bosilca wrote: > The leave pinned will not help in this context. It can only help for devices > capable of real RMA operations and that require pinned memory, which > unfortunately is not the case for TCP. What is [really] strange

[OMPI users] strange IMB runs

2009-07-29 Thread Michael Di Domenico
I'm not sure I understand what's actually happened here. I'm running IMB on an HP superdome, just comparing the PingPong benchmark HP-MPI v2.3 Max ~ 700-800MB/sec OpenMPI v1.3 -mca btl self,sm - Max ~ 125-150MB/sec -mca btl self,tcp - Max ~ 500-550MB/sec Is this behavior expected? Are there

Re: [OMPI users] quadrics support?

2009-07-08 Thread Michael Di Domenico
On Wed, Jul 8, 2009 at 3:33 PM, Ashley Pittman wrote: >> When i run tping i get: >> ELAN_EXCEOPTIOn @ --: 6 (Initialization error) >> elan_init: Can't get capability from environment >> >> I am not using slurm or RMS at all, just trying to get openmpi to run >> between two

Re: [OMPI users] quadrics support?

2009-07-08 Thread Michael Di Domenico
On Wed, Jul 8, 2009 at 12:33 PM, Ashley Pittman wrote: > Is the machine configured correctly to allow non OpenMPI QsNet programs > to run, for example tping? > > Which resource manager are you running, I think slurm compiled for RMS > is essential. I can ping via TCP/IP

Re: [OMPI users] quadrics support?

2009-07-07 Thread Michael Di Domenico
the processes to go away I'm not sure if this is a quadrics or openmpi issue at this point, but i figured since there are quadrics people on the list its a good place to start On Tue, Jul 7, 2009 at 3:30 PM, Michael Di Domenico<mdidomeni...@gmail.com> wrote: > Does OpenMPI/Quadrics require the Quadri

Re: [OMPI users] quadrics support?

2009-07-07 Thread Michael Di Domenico
Does OpenMPI/Quadrics require the Quadrics Kernel patches in order to operate? Or operate at full speed or are the Quadrics modules sufficient? On Thu, Jul 2, 2009 at 1:52 PM, Ashley Pittman<ash...@pittman.co.uk> wrote: > On Thu, 2009-07-02 at 09:34 -0400, Michael Di Domenico wrote

Re: [OMPI users] quadrics support?

2009-07-02 Thread Michael Di Domenico
e anyone did any work in this area. Contributions are always welcome, of > course! :-) > > > > On Jul 2, 2009, at 9:12 AM, Michael Di Domenico wrote: > >> Jeff, >> >> Thanks, honestly though if the patches haven't been pulled mainline, >> we are not likely

Re: [OMPI users] quadrics support?

2009-07-02 Thread Michael Di Domenico
-- > > I know that U. Tennessee did some work in this area; did it ever > materialize? > > > On Jul 1, 2009, at 4:49 PM, Michael Di Domenico wrote: > >> Did the quadrics support for OpenMPI ever materialize? I can't find >> any documentation on the web about it a

[OMPI users] quadrics support?

2009-07-01 Thread Michael Di Domenico
Did the quadrics support for OpenMPI ever materialize? I can't find any documentation on the web about it and the few mailing list messages I came across showed some hints that it might be in progress but that was way back in 2007 Thanks