Hi,
I'm experiencing a strange problem with running LIGGGHTS on 48 core
workstation running Ubuntu 14.04.4 LTS.
If I cold boot the workstation and start one of the examples form
LIGGGHTS then everything looks fine:
$ mpirun -np 48 liggghts < in.chute_wear
launches the example on all 48 cores,
Dear Gilles
Thanks for the reply.
Regards
Ryan
On Wed, 16 Mar 2016 11:39:49 +0530 Gilles Gouaillardet wrote
>
Ray,
from shmem_ptr man page :
RETURN VALUES
shmem_ptr returns a pointer to the data object on the
specified remote PE. If target is not remotely
Just some thoughts offhand:
* what version of OMPI are you using?
* are you saying that after the warm reboot, all 48 procs are running on a
subset of cores?
* it sounds like some of the cores have been marked as “offline” for some
reason. Make sure you have hwloc installed on the machine, and
can you try to add
--mca mtl psm
to your mpirun command line ?
you might also have to blacklist the opening btl
Cheers,
Gilles
On Thursday, March 17, 2016, dpchoudh . wrote:
> Hello all
> I have a simple test setup, consisting of two Dell workstation nodes with
> similar hardware profile.
>
>
On Wed, Mar 16, 2016 at 4:49 PM, Cabral, Matias A
wrote:
> I didn't go into the code to see who is actually calling this error message,
> but I suspect this may be a generic error for "out of memory" kind of thing
> and not specific to the que pair. To confirm please add -mca
> pml_base_verbos
Additionally, if you run
ompi_info | grep psm
Do you see the PSM MTL listed?
To force the CM MTL, you can run:
mpirun --mca pml cm ...
That won't let any BTLs be selected (because only ob1 uses the BTLs).
> On Mar 17, 2016, at 8:07 AM, Gilles Gouaillardet
> wrote:
>
> can you try to a
Michael Di Domenico writes:
> On Wed, Mar 16, 2016 at 12:12 PM, Elken, Tom wrote:
>> Hi Mike,
>>
>> In this file,
>> $ cat /etc/security/limits.conf
>> ...
>> < do you see at the end ... >
>>
>> * hard memlock unlimited
>> * soft memlock unlimited
>> # -- All InfiniBand Settings End here --
>> ?
Ralph Castain writes:
> That’s an SGE error message - looks like your tmp file system on one
> of the remote nodes is full.
Yes; surely that just needs to be fixed, and I'd expect the host not to
accept jobs in that state. It's not just going to break ompi.
> We don’t control where SGE puts it
Husen R writes:
> Dear Open MPI Users,
>
>
> Does the current stable release of Open MPI (v1.10 series) support fault
> tolerant feature ?
> I got the information from Open MPI FAQ that The checkpoint/restart support
> was last released as part of the v1.6 series.
> I just want to make sure about
Just to clarify: I am not aware of any MPI that will allow you to relocate a
process while it is running. You have to checkpoint the job, terminate it, and
then restart the entire thing with the desired process on the new node.
> On Mar 16, 2016, at 3:15 AM, Husen R wrote:
>
> In the case of
also, limits.conf is set when starting a ssh session.
it is not useful for services started at boot time, and
ulimit -l unlimited
should be added in the startup script
/etc/init.d/xxx
or
/etc/sysconfig/xxx
Cheers,
Gilles
On Thursday, March 17, 2016, Dave Love wrote:
> Michael Di Domenico > wri
Presumably Adaptive MPI would allow you to do that. I don’t know all the
details of how that works there though.
From: users on behalf of Ralph Castain
Reply-To: Open MPI Users
Date: Thursday, March 17, 2016 at 9:17 AM
To: Open MPI Users
Subject: Re: [OMPI users] Fault tolerant feature in Op
Thank you everybody. With your help I was able to resolve the issue. For
the sake of completeness, this is what I had to do:
infinipath-psm was already installed in my system when OpenMPI was built
from source. However, infinipath-psm-devel was NOT installed. I suppose
that's why openMPI could not
hello,
I unsuccessfully tried to link the statically compiled OpenMPI library.
I used for compilation:
./configure --enable-static -disable-shared
make -j 4
make install
When I try to link the library to my executable, I get the following error:
gcc mm.c --static -I/usr/local/include/openmpi mm
On Mar 17, 2016, at 10:54 AM, evelina dumitrescu
wrote:
>
> hello,
>
> I unsuccessfully tried to link the statically compiled OpenMPI library.
> I used for compilation:
>
> ./configure --enable-static -disable-shared
> make -j 4
> make install
>
> When I try to link the library to my executab
Instead of --static try using -Wl,-Bstatic. I do not think you can
safely mix --static with -Wl,-Bdynamic.
-Nathan
HPC-ENV, LANL
On Thu, Mar 17, 2016 at 03:54:33PM +0100, evelina dumitrescu wrote:
>hello,
>
>I unsuccessfully tried to link the statically compiled OpenMPI library.
>I
I was looking for lines like" [nodexyz:17085] selected cm best priority 40" and
" [nodexyz:17099] select: component psm selected"
_MAC
-Original Message-
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Michael Di Domenico
Sent: Thursday, March 17, 2016 5:52 AM
To: Open MPI
On Thu, Mar 17, 2016 at 12:15 PM, Cabral, Matias A
wrote:
> I was looking for lines like" [nodexyz:17085] selected cm best priority 40"
> and " [nodexyz:17099] select: component psm selected"
i see cm best priority 20, which seems to relate to ob1 being
selected. i don't see a mention of psm a
Michael --
Can you send all the information listed here?
https://www.open-mpi.org/community/help/
(including the full output from the run with the PML/BTL/MTL/etc. verbosity)
This will allow Matias to look through all the relevant info, potentially with
fewer back-n-forth emails.
Thanks!
On Thu, Mar 17, 2016 at 12:52 PM, Jeff Squyres (jsquyres)
wrote:
> Can you send all the information listed here?
>
> https://www.open-mpi.org/community/help/
>
> (including the full output from the run with the PML/BTL/MTL/etc. verbosity)
>
> This will allow Matias to look through all the rele
On Thu, Mar 17, 2016 at 3:17 PM, Ralph Castain wrote:
> Just to clarify: I am not aware of any MPI that will allow you to relocate a
> process while it is running. You have to checkpoint the job, terminate it,
> and then restart the entire thing with the desired process on the new node.
>
Dear a
On Thu, Mar 17, 2016 at 12:15 PM, Cabral, Matias A
wrote:
> I was looking for lines like" [nodexyz:17085] selected cm best priority 40"
> and " [nodexyz:17099] select: component psm selected"
this may have turned up more then i expected. i recompiled openmpi
v1.8.4 as a test and reran the test
All,
I have an mpi-based program that has a master process that acts as a 'traffic
cop' in that it hands out work to child processes.
I want to be able to dynamically throttle how many child processes are in use
at any given time.
For instance, if I start it with "mpirun -n 512" I could send a
Hmmm….I haven’t heard of that specific use-case, but I have seen some similar
things. Did you want the processes to be paused, or killed, when you scale
down? Obviously, I’m assuming they are not MPI procs, yes?
I can certainly see a way to make mpirun do it without too much fuss, though it
wou
I remember years ago, OpenMPI (version 1.3.3) required the hard/soft open
files limits be >= 4096 in order to function when large numbers of slots
were requested (with 1.3.3 this was at roughly 85 slots). Is this requirement
still present for OpenMPI versions 1.10.1 and greater?
I'm having some is
No, that shouldn’t be the issue any more - and that isn’t what the backtrace
indicates. It looks instead like there was a problem with the shared memory
backing file on a remote node, and that caused the vader shared memory BTL to
segfault.
Try turning vader off and see if that helps - I’m not
I apologize Ralph, I forgot to include my command line for invoking OpenMPI on
SoGE:
qsub -q short.q -V -pe make 87 -b y mpirun -np 87 --prefix
/hpc/apps/mpi/openmpi/1.10.1/ --hetero-nodes --mca btl ^sm --mca
plm_base_verbose 5 /hpc/home/lanew/mpi/openmpi/a_1_10_1.out
a_1_10_1.out is my OpenMP
I ran OpenMPI using the "-mca btl ^vader" switch Ralph recommended and I'm
still getting the same errors
qsub -q short.q -V -pe make 206 -b y mpirun -np 206 --prefix
/hpc/apps/mpi/openmpi/1.10.1/ --hetero-nodes --mca btl ^vader --mca
plm_base_verbose 5 /hpc/home/lanew/mpi/openmpi/a_1_10_1.out
Brian,
unlike Ralph, i will assume all your processes are MPI tasks.
at first glance, the MPI philosophy is the other way around :
start with mpirun -np 1 traffic_cop, and then MPI_Comm_spawn("child")
when you need more workers.
that being said, if you are fine with having idle children (e.g.
Yeah, it looks like something is wrong with the mmap backend for some reason.
It gets used by both vader and sm, so no help there.
I’m afraid I’ll have to defer to Nathan from here as he is more familiar with
it than I.
> On Mar 17, 2016, at 4:55 PM, Lane, William wrote:
>
> I ran OpenMPI us
30 matches
Mail list logo