[OMPI users] Strange problem with mpirun and LIGGGHTS after reboot of machine

2016-03-17 Thread Rainer Koenig
Hi, I'm experiencing a strange problem with running LIGGGHTS on 48 core workstation running Ubuntu 14.04.4 LTS. If I cold boot the workstation and start one of the examples form LIGGGHTS then everything looks fine: $ mpirun -np 48 liggghts < in.chute_wear launches the example on all 48 cores,

Re: [OMPI users] Open SHMEM Error

2016-03-17 Thread RYAN RAY
Dear Gilles Thanks for the reply. Regards Ryan On Wed, 16 Mar 2016 11:39:49 +0530 Gilles Gouaillardet wrote > Ray, from shmem_ptr man page : RETURN VALUES    shmem_ptr returns a pointer to the data object on the specified remote PE. If target is not

Re: [OMPI users] Strange problem with mpirun and LIGGGHTS after reboot of machine

2016-03-17 Thread Ralph Castain
Just some thoughts offhand: * what version of OMPI are you using? * are you saying that after the warm reboot, all 48 procs are running on a subset of cores? * it sounds like some of the cores have been marked as “offline” for some reason. Make sure you have hwloc installed on the machine,

Re: [OMPI users] Issue about cm PML

2016-03-17 Thread Gilles Gouaillardet
can you try to add --mca mtl psm to your mpirun command line ? you might also have to blacklist the opening btl Cheers, Gilles On Thursday, March 17, 2016, dpchoudh . wrote: > Hello all > I have a simple test setup, consisting of two Dell workstation nodes with > similar

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Michael Di Domenico
On Wed, Mar 16, 2016 at 4:49 PM, Cabral, Matias A wrote: > I didn't go into the code to see who is actually calling this error message, > but I suspect this may be a generic error for "out of memory" kind of thing > and not specific to the que pair. To confirm please

Re: [OMPI users] Issue about cm PML

2016-03-17 Thread Jeff Squyres (jsquyres)
Additionally, if you run ompi_info | grep psm Do you see the PSM MTL listed? To force the CM MTL, you can run: mpirun --mca pml cm ... That won't let any BTLs be selected (because only ob1 uses the BTLs). > On Mar 17, 2016, at 8:07 AM, Gilles Gouaillardet >

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Dave Love
Michael Di Domenico writes: > On Wed, Mar 16, 2016 at 12:12 PM, Elken, Tom wrote: >> Hi Mike, >> >> In this file, >> $ cat /etc/security/limits.conf >> ... >> < do you see at the end ... > >> >> * hard memlock unlimited >> * soft memlock unlimited >>

Re: [OMPI users] running OpenMPI jobs (either 1.10.1 or 1.8.7) on SoGE more problems

2016-03-17 Thread Dave Love
Ralph Castain writes: > That’s an SGE error message - looks like your tmp file system on one > of the remote nodes is full. Yes; surely that just needs to be fixed, and I'd expect the host not to accept jobs in that state. It's not just going to break ompi. > We don’t

Re: [OMPI users] Fault tolerant feature in Open MPI

2016-03-17 Thread Dave Love
Husen R writes: > Dear Open MPI Users, > > > Does the current stable release of Open MPI (v1.10 series) support fault > tolerant feature ? > I got the information from Open MPI FAQ that The checkpoint/restart support > was last released as part of the v1.6 series. > I just want

Re: [OMPI users] Fault tolerant feature in Open MPI

2016-03-17 Thread Ralph Castain
Just to clarify: I am not aware of any MPI that will allow you to relocate a process while it is running. You have to checkpoint the job, terminate it, and then restart the entire thing with the desired process on the new node. > On Mar 16, 2016, at 3:15 AM, Husen R wrote: >

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Gilles Gouaillardet
also, limits.conf is set when starting a ssh session. it is not useful for services started at boot time, and ulimit -l unlimited should be added in the startup script /etc/init.d/xxx or /etc/sysconfig/xxx Cheers, Gilles On Thursday, March 17, 2016, Dave Love wrote: >

Re: [OMPI users] Fault tolerant feature in Open MPI

2016-03-17 Thread Bland, Wesley
Presumably Adaptive MPI would allow you to do that. I don’t know all the details of how that works there though. From: users on behalf of Ralph Castain Reply-To: Open MPI Users Date: Thursday, March 17, 2016 at 9:17 AM To:

Re: [OMPI users] Issue about cm PML

2016-03-17 Thread dpchoudh .
Thank you everybody. With your help I was able to resolve the issue. For the sake of completeness, this is what I had to do: infinipath-psm was already installed in my system when OpenMPI was built from source. However, infinipath-psm-devel was NOT installed. I suppose that's why openMPI could

[OMPI users] How to link the statically compiled OpenMPI library ?

2016-03-17 Thread evelina dumitrescu
hello, I unsuccessfully tried to link the statically compiled OpenMPI library. I used for compilation: ./configure --enable-static -disable-shared make -j 4 make install When I try to link the library to my executable, I get the following error: gcc mm.c --static -I/usr/local/include/openmpi

Re: [OMPI users] How to link the statically compiled OpenMPI library ?

2016-03-17 Thread Jeff Squyres (jsquyres)
On Mar 17, 2016, at 10:54 AM, evelina dumitrescu wrote: > > hello, > > I unsuccessfully tried to link the statically compiled OpenMPI library. > I used for compilation: > > ./configure --enable-static -disable-shared > make -j 4 > make install > > When I try

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Cabral, Matias A
I was looking for lines like" [nodexyz:17085] selected cm best priority 40" and " [nodexyz:17099] select: component psm selected" _MAC -Original Message- From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Michael Di Domenico Sent: Thursday, March 17, 2016 5:52 AM To: Open

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Michael Di Domenico
On Thu, Mar 17, 2016 at 12:15 PM, Cabral, Matias A wrote: > I was looking for lines like" [nodexyz:17085] selected cm best priority 40" > and " [nodexyz:17099] select: component psm selected" i see cm best priority 20, which seems to relate to ob1 being selected. i

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Jeff Squyres (jsquyres)
Michael -- Can you send all the information listed here? https://www.open-mpi.org/community/help/ (including the full output from the run with the PML/BTL/MTL/etc. verbosity) This will allow Matias to look through all the relevant info, potentially with fewer back-n-forth emails. Thanks!

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Michael Di Domenico
On Thu, Mar 17, 2016 at 12:52 PM, Jeff Squyres (jsquyres) wrote: > Can you send all the information listed here? > > https://www.open-mpi.org/community/help/ > > (including the full output from the run with the PML/BTL/MTL/etc. verbosity) > > This will allow Matias to look

Re: [OMPI users] Fault tolerant feature in Open MPI

2016-03-17 Thread Xavier Besseron
On Thu, Mar 17, 2016 at 3:17 PM, Ralph Castain wrote: > Just to clarify: I am not aware of any MPI that will allow you to relocate a > process while it is running. You have to checkpoint the job, terminate it, > and then restart the entire thing with the desired process on the

Re: [OMPI users] locked memory and queue pairs

2016-03-17 Thread Michael Di Domenico
On Thu, Mar 17, 2016 at 12:15 PM, Cabral, Matias A wrote: > I was looking for lines like" [nodexyz:17085] selected cm best priority 40" > and " [nodexyz:17099] select: component psm selected" this may have turned up more then i expected. i recompiled openmpi v1.8.4

[OMPI users] Dynamically throttle/scale processes

2016-03-17 Thread Andrus, Brian Contractor
All, I have an mpi-based program that has a master process that acts as a 'traffic cop' in that it hands out work to child processes. I want to be able to dynamically throttle how many child processes are in use at any given time. For instance, if I start it with "mpirun -n 512" I could send

Re: [OMPI users] Dynamically throttle/scale processes

2016-03-17 Thread Ralph Castain
Hmmm….I haven’t heard of that specific use-case, but I have seen some similar things. Did you want the processes to be paused, or killed, when you scale down? Obviously, I’m assuming they are not MPI procs, yes? I can certainly see a way to make mpirun do it without too much fuss, though it

[OMPI users] OpenMPI 1.10.1 *ix hard/soft open files limits >= 4096 still required?

2016-03-17 Thread Lane, William
I remember years ago, OpenMPI (version 1.3.3) required the hard/soft open files limits be >= 4096 in order to function when large numbers of slots were requested (with 1.3.3 this was at roughly 85 slots). Is this requirement still present for OpenMPI versions 1.10.1 and greater? I'm having some

Re: [OMPI users] OpenMPI 1.10.1 *ix hard/soft open files limits >= 4096 still required?

2016-03-17 Thread Ralph Castain
No, that shouldn’t be the issue any more - and that isn’t what the backtrace indicates. It looks instead like there was a problem with the shared memory backing file on a remote node, and that caused the vader shared memory BTL to segfault. Try turning vader off and see if that helps - I’m not

Re: [OMPI users] OpenMPI 1.10.1 *ix hard/soft open files limits >= 4096 still required?

2016-03-17 Thread Lane, William
I apologize Ralph, I forgot to include my command line for invoking OpenMPI on SoGE: qsub -q short.q -V -pe make 87 -b y mpirun -np 87 --prefix /hpc/apps/mpi/openmpi/1.10.1/ --hetero-nodes --mca btl ^sm --mca plm_base_verbose 5 /hpc/home/lanew/mpi/openmpi/a_1_10_1.out a_1_10_1.out is my

Re: [OMPI users] OpenMPI 1.10.1 *ix hard/soft open files limits >= 4096 still required?

2016-03-17 Thread Lane, William
I ran OpenMPI using the "-mca btl ^vader" switch Ralph recommended and I'm still getting the same errors qsub -q short.q -V -pe make 206 -b y mpirun -np 206 --prefix /hpc/apps/mpi/openmpi/1.10.1/ --hetero-nodes --mca btl ^vader --mca plm_base_verbose 5 /hpc/home/lanew/mpi/openmpi/a_1_10_1.out

Re: [OMPI users] Dynamically throttle/scale processes

2016-03-17 Thread Gilles Gouaillardet
Brian, unlike Ralph, i will assume all your processes are MPI tasks. at first glance, the MPI philosophy is the other way around : start with mpirun -np 1 traffic_cop, and then MPI_Comm_spawn("child") when you need more workers. that being said, if you are fine with having idle children

Re: [OMPI users] OpenMPI 1.10.1 *ix hard/soft open files limits >= 4096 still required?

2016-03-17 Thread Ralph Castain
Yeah, it looks like something is wrong with the mmap backend for some reason. It gets used by both vader and sm, so no help there. I’m afraid I’ll have to defer to Nathan from here as he is more familiar with it than I. > On Mar 17, 2016, at 4:55 PM, Lane, William